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that bind to activated GPCRs in the absence of G-protein coupled receptor kinases which may be limiting; and engineering mutant 
super arrestin proteins that have an increased affinity for activated GPCRs with or without phosphorylation. These methods are 
intended to increase the robustness of the GPCR/ICAST technology in situations in which G-protein coupled receptor kinases are 
absent or limiting, or in which the GPCR is not efficiently down -regulated or is rapidly resensitized (thus having a labile interaction 
with arrestin). Included are also more specific methods for using ICAST complementary enzyme fragments to monitor GPCR 
homo- and hetero- dimerization with applications for drug lead discovery and ligand and function discovery for orphan GPCRs. 
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TITLE OF THE INVENTION 

IMPROVED SYSTEMS FOR SENSITIVE DETECTION OF G-PROTEIN 
COUPLED RECEPTOR AND ORPHAN RECEPTOR FUNCTION 
USING REPORTER ENZYME MUTANT COMPLEMENTATION 

BACKGROUND OF THE INVENTION 

This application is a continuation-in-part of U.S. Application Serial No. 
09/654,499, filed September 1, 2000, which claims the benefit from Provisional 
Application Serial No. 60/1 80,669, filed February 7, 2000. The entirety of U.S. 
5 Application Serial No. 09/654,499 and Provisional Application Serial No. 
60/180,669 are incorporated herein by reference. 

Field of the Invention 

The present invention relates to methods of detecting G-protein-coupled 
1 0 receptor (GPCR) activity, and provides methods of assaying GPCR activity, 

methods for screening for GPCR ligands, agonists and/or antagonists, methods for 
screening natural and surrogate ligands for orphan GPCRs, and methods for 
screening compounds that interact with components of the GPCR regulatory 
process. 

15 

Background of the Technology 

The actions of many extracellular signals are mediated by the interaction of 
G-protein- coupled receptors (GPCRs) and guanine nucleotide-binding regulatory 
proteins (G-proteins). G-protein-mediated signaling systems have been identified in 
20 many divergent organisms, such as mammals and yeast. The GPCRs represent a 
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large super family of proteins which have divergent amino acid sequences, but 
share common structural features, in particular, the presence of seven 
transmembrane helical domains. GPCRs respond to, among other extracellular 
signals, neurotransmitters, hormones, odorants and light. Individual GPCR types 
5 activate a particular signal transduction pathway; at least ten different signal 

transduction pathways are known to be activated via GPCRs. For example, the 
beta 2-adrenergic receptor (P2AR) is a prototype mammalian GPCR. In response 
to agonist binding, p2AR receptors activate a G-protein (Gs) which in turn 
stimulates adenylate cyclase activity and results in increased cyclic adenosine 
1 0 monophosphate (cAMP) production in the cell. 

The signaling pathway and final cellular response that result from GPCR 
stimulation depends on the specific class of G-protein with which the particular 
receptor is coupled (Hamm, 'The Many Faces of G-Protein Signaling." J. Biol. 
Chem., 273:669-672 (1998)). For instance, coupling to the Gs class of G-proteins 
1 5 stimulates cAMP production and activation of the Protein Kinase A and C 

pathways, whereas coupling to the Gi class of G-proteins down regulates cAMP. 
Other second messenger systems such as calcium, phospholipase C, and 
phosphatidylinositol 3 may also be utilized. As a consequence, GPCR signaling 
events have predominantly been measured via quantification of these second 
20 messenger products. 

The decrease of a response to a persistent stimulus is a widespread 
biological phenomenon. Signaling by diverse GPCRs is believed to be terminated 
by a uniform two-step mechanism. Activated receptor is first phosphorylated by a 
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GPCR kinase (GRK). An arrestin protein binds to the activated and 
phosphorylated receptor, thus blocking G-protein interaction. This process is 
commonly referred to as desensitization, a general mechanism that has been 
demonstrated in a variety of functionally diverse GPCRs. Arrestin also plays a part 
5 in regulating GPCR internalization and resensitization, processes that are 

heterogenous among different GPCRs (Oakley, et al.. J. Biol. Chem., 274:32248- 
32257 (1 999)). The interaction between an arrestin and GPCR in processes of 
internalization and resensitization is dictated by the specific sequence motif in the 
carboxyl terminus of a given GPCR. Only a subset of GPCRs, which possess 
1 0 clusters of three serine or threonine residues at the carboxyl termini, were found to 
co-trafBck with the arrestins into the endocytic vesicles after ligand stimulation. 
The number of receptor kinases and arrestins involved in desensitization of GPCRs 
is rather limited. 

A common feature of GPCR physiology is desensitization and recycling of 
15 the receptor through the processes of receptor phosphorylation, endocytosis and 

dephosphorylation (Ferguson, et al.. "G-protein-coupled receptor regulation: role of 
G-protein-coupled receptor kinases and arrestins." Can. J. Physiol. Pharmacol., 
74: 1095-1 1 10 (1996)). Ligand-occupied GPCRs can be phosphorylated by two 
families of serine/threonine kinases, the G-protein-coupled receptor kinases 
20 (GRKs) and the second messenger-dependent protein kinases such as protein 

kinase A and protein kinase C. Phosphorylation by either class of kinases serves to 
down-regulate the receptor by uncoupling it from its corresponding G-protein. 
GRK-phosphorylation also serves to down-regulate the receptor by recruitment of a 
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class of proteins known as the arrestins that bind the cytoplasmic domain of the 
receptor and promote clustering of the receptor into endocytic vescicles. Once the 
receptor is endocytosed, it will either be degraded in lysosomes or 
dephosphorylated and recycled back to the plasma membrane as a fiilly-functional 
5 receptor. 

Binding of an arrestin protein to an activated receptor has been documented 
as a common phenomenon of a variety of GPCRs ranging from rhodopsin to (52AR 
to the neurotensin receptor (Barak, et ah, "A P-arrestin/Green Fluorescent Fusion 
Protein Biosensor for Detecting G-Protein-Coupled Receptor Activation," J. Biol. 
1 0 Chem., 272:27497-500 (1 997)). Consequently, monitoring arrestin interaction with 
a specific GPCR can be utilized as a generic tool for measuring GPCR activation. 
Similarly, a single G-protein and GRK also partner with a variety of receptors 
(Hamm. et ah (1 998) and Pitcher et aL "G-Protein-Coupled Receptor Kinases/' 
Annu. Rev. Biochem., 67:653-92 (1998)), such that these protein/protein 
1 5 interactions may also be monitored to determine receptor activity. 

Many therapeutic drugs in use today target GPCRs, as they regulate vital 
physiological responses, including vasodilation, heart rate, bronchodilation, 
endocrine secretion and gut peristalsis. See, e.g., Lefkowitz et al.. Annu. Rev. 
Biochem., 52:159 (1983). Some of these drugs mimic the ligand for this receptor. 
20 Other drugs act to antagonize the receptor in cases when disease arises from 
spontaneous activity of the receptor. 
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Efforts such as the Human Genome Project are identifying new GPCRs 
("orphan" receptors) whose physiological roles and ligands are unknown. It is 
estimated that several thousand GPCRs exist in the human genome. 

Various approaches have been used to monitor intracellular activity in 
5 response to a stimulant, e.g.. enzyme-linked immunosorbent assay (ELISA); 
Fluorescense Imaging Plate Reader assay (FLIPR™ 5 Molecular Devices Corp., 
Sunnyvale, CA); EVOscreen™, EVOTEC™, Evotec Biosystems Gmbh, Hamburg, 
Germany; and techniques developed by CELLOMICS™, Cellomics, Inc., 
Pittsburgh, PA. 

1 0 Germino et ah. "Screening for in vivo protein-protein interactions." Proc. 

Natl. Acad. Sci., 90(3):933-937 (1993), discloses an in vivo approach for the 
isolation of proteins interacting with a protein of interest. 

Phizickv et al.. "Protein-protein interactions: methods for detection and 
analysis." Microbiol. Rev., 59(1): 94-123 (1995), discloses a review of 
1 5 biochemical, molecular biological and genetic methods used to study protein- 
protein interactions. 

Offermanns et al.. "Ga I5 and Ga, 6 Couple a Wide Variety of Receptors to 
Phospholipase C ." J. Biol. Chem., 270(25):15175-15180 (1995), discloses that 
Ga, 5 and Ga, 6 can be activated by a wide variety of G-protein-coupled receptors. 
20 The selective coupling of an activated receptor to a distinct pattern of G-proteins is 
regarded as an important requirement to achieve accurate signal transduction. Id. 

Barak et al., "A P-arrestin/Green Fluorescent Protein Biosensor for 
Detecting G Protein-coupled Receptor Activation." J. Biol. Chem., 272(44) :27497- 
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27500 (1997) and U.S. Patents Nos. 5,891,646 and 6,1 10,693 disclose the use of a 
P-arrestin/green fluorescent fusion protein (GFP) for imaging protein translocation 
upon stimulation of GPCR with optical devices. 

Each of the references described above has drawbacks. For example, 
5 • The prior art methodologies require over-expression of the proteins, 

which could cause artifact and tip the balance of cellular regulatory 
machineries. 

• The prior art visualization or imaging assays are low throughput and 
lack thorough quantification. Therefore, they are not suitable for 
1 0 high throughput pharmacological and kinetic assays. 

In addition, many of the prior art assays require isolation of the GPCR rather than 
observation of the GPCR in a cell. There thus exists a need for improved methods 
for monitoring GPCR function. 

15 SUMMARY OF THE INVENTION 

The present invention provides modifications to the disclosure in U.S. 
Application Serial No. 09/654,499. In particular, the present invention is directed 
to modifications of the below aspects of the invention to further enhance assay 
sensitivity. The modifications include the use of genetically modified arrestins that 
20 exhibit enhanced binding to activated GPCR regardless of whether the GPCR is 
phosphorylated or non-phosphorylated; the use of a serine/threonine cluster 
strategy to facilitate screening assays for orphan receptors that do not possess this 
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structural motif on their own; and the use of a combination of the above 
modifications to achieve even more enhanced detection. 

A first aspect of the present invention is a method that monitors GPCR 
function proximally at the site of receptor activation, thus providing more 
5 information for drug discovery purposes due to fewer competing mechanisms. 
Activation of the GPCR is measured by a read-out for interaction of the receptor 
with a regulatory component such as arrestin, G-protein, GRK or other kinases, the 
binding of which to the receptor is dependent upon agonist occupation of the 
receptor. The present invention involves the detection of protein/protein 

1 0 interaction by complementation of mutant reporter enzymes. 

Binding of arrestin to activated GPCR is a common process in the first step 
of desensitization that has been demonstrated for most, if not all, GPCRs studied so 
far. Measurement of GPCR interaction with arrestin via mutant enzyme 
complementation (i.e.« ICAST) provides a more generic assay technology 

1 5 applicable for a wide variety of GPCRs and orphan receptors. 

A further aspect of the present invention is a method of assessing GPCR 
pathway activity under test conditions by providing a test cell that expresses a 
GPCR, e.g., muscarinic, adrenergic, dopamine, angiotensin or endothelin, as a 
fusion protein to a mutant reporter enzyme and interacting a protein in the GPCR 

20 pathway, e.g.. G-protein, anrestin or GRK, as a fusion protein with a 

complementing mutant reporter enzyme. When test cells are exposed to a known 
agonist to the target GPCR under test conditions, activation of the GPCR will be 
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monitored by complementation of the reporter enzyme. Increased reporter enzyme 
activity reflects interaction of the GPCR with its interacting protein partner. 

A further aspect of the present invention is a method of assessing GPCR 
pathway activity in the presence of a test arrestin, e.g., P-arrestin. 
5 A further aspect of the present invention is a method of assessing GPCR 

pathway activity in the presence of a test G-protein. 

A further aspect of the present invention is a method of assessing GPCR 
pathway activity upon exposure of the test cell to a test ligand. 

A further aspect of the present invention is a method of assessing GPCR 
1 0 activity upon co-expression in the test cell of a second receptor. The second 

receptor could be the same GPCR or oiphan receptor (i.e., homo-dimerization), a 
different GPCR or orphan receptor (ie^ hetero-dimerization) or could be a receptor 
of another type. 

A further aspect of the present invention is a method for screening for a 
1 5 ligand or agonist to an oiphan GPCR. The ligand or agonist could be contained in 
natural or synthetic libraries or mixtures or could be a physical stimulus. A test 
cell is provided that expresses the orphan GPCR as a fusion protein with a mutant 
reporter enzyme, e^ a P-galactosidase mutant, and, for example, an arrestin or 
mutant form of arrestin as a fusion protein with a complementing mutant reporter 
20 enzyme, e.g., another P-galactosidase mutant. The interaction of the arrestin with 
the orphan GPCR upon receptor activation is measured by enzymatic activity of the 
complemented reporter enzyme. The test cell is exposed to a test compound, and 
an increase in reporter enzyme activity indicates the presence of a ligand or agonist. 
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A further aspect of the present invention is a method for screening a 
protein of interest, for example, an anrestin protein (or mutant form of the arrestin 
protein) for the ability to bind to a phosphorylated, or activated, GPCR. A test cell 
is provided that expresses a GPCR as a fusion protein with a mutant reporter 
5 enzyme, ^g., a P-galactosidase mutant, and contains arrestin (or a mutant form of 
arrestin) as a fusion protein with a complementing mutant reporter enzyme, e.g.. 
another p-galactosidase mutant. The interaction of arrestin with the GPCR upon 
receptor activation is measured by enzymatic activity of the complemented reporter 
enzyme. The test cell is exposed to a known GPCR agonist and then reporter 
1 0 enzyme activity is detected. Increased reporter enzyme activity indicates that the 
P-arrestin molecule can bind to phosphorylated, or activated, GPCR in the test cell. 

A further aspect of the present invention is a method to screen for an 
agonist to a specific GPCR. The agonist could be contained in natural or synthetic 
libraries or could be a physical stimulus. A test cell is provided that expresses a 
1 5 GPCR as a fusion protein with a mutant reporter enzyme, e.g.. a p-galactosidase 
mutant, and, for example, an arrestin as a fusion protein with a complementing 
mutant reporter enzyme, e^ another p-galactosidase mutant. The interaction of 
arrestin with the GPCR upon receptor activation is measured by enzymatic activity 
of the complemented reporter enzyme. The test cell is exposed to a test compound, 
20 and an increase in reporter enzyme activity indicates the presence of an agonist. 
The test cell may express a known GPCR or a variety of known GPCRs, or may 
express an unknown GPCR or a variety of unknown GPCRs. The GPCR may be, 
for example, an odorant GPCR or a pAR GPCR. 
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A farther aspect of the present invention is a method for screening a test 
compound for GPCR antagonist activity. A test cell is provided that expresses a 
GPCR as a fusion protein with a mutant reporter enzyme, e^ a p-galactosidase 
mutant, and, for example, an arrestin as a fusion protein with a complementing ■: 
5 mutant reporter en2yme, e^ another p-galactosidase mutant* The interaction of 
arrestin with the GPCR upon receptor activation is measured by enzymatic activity 
of the complemented reporter enzyme. The test cell is exposed to a test compound, 
and an increase in reporter enzyme activity indicates the presence of an agonist. 
The cell is exposed to a test compound and to a GPCR agonist, and reporter 
1 0 enzyme activity is detected. When exposure to the agonist occurs at the same time 
as or subsequent to exposure to the test compound, a decrease in reporter enzyme 
activity after exposure to the test compound indicates that the test compound has 
antagonist activity to the GPCR. 

A further aspect of the present invention is a method of screening a sample 
1 5 solution for the presence of an agonist, antagonist or ligand to a GPCR. A test cell 
is provided that expresses GPCR as a fusion protein with a mutant reporter 
enzyme, a p-galactosidase mutant, and contains, for example, a P-arrestin as a 
fusion protein with a complementing reporter, e^ another P-galactosidase mutant. 
The test cell is exposed to a sample solution, and reporter enzyme activity is 
10 assessed. Changed reporter enzyme activity after exposure to the sample solution 
indicates the sample solution contains an agonist, antagonist or ligand for a GPCR 
expressed in the cell. 
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A farther aspect of the present invention is a method of screening a cell for 
the presence of a GPCR. According to this aspect, an arrestin fusion protein with a 
mutant reporter enzyme and a GPCR downstream signaling fusion protein with a 
mutant reporter enzyme are employed to detect GPCR action. A modification of 
5 this aspect of the invention can be employed to provide a method of screening a 

plurality of cells for those cells which contain a GPCR According to this aspect, a 
plurality of cells containing a conjugate comprising a p-airestin protein as a fusion 
protein with a reporter enzyme are provided; the plurality of cells are exposed to a 
GPCR agonist; and activity of reporter enzyme activity is detected. An increase in 
reporter enzymatic activity after exposure to the GPCR agonist indicates P-airestin 
protein binding to a GPCR, thereby indicating that the cell contains a GPCR 
responsive to the GPCR agonist. 

A further aspect of the invention is a method for mapping GPCR-mediated 
signaling pathways. For instance, the system could be utilized to monitor 
interaction of c-src with P-airestin-1 upon GPCR activation. Additionally, the 
system could be used to monitor protein/protein interactions involved in cross-talk 
between GPCR signaling pathways and other pathways such as that of the receptor 
tyrosine kinases or Ras/Raf. According to this aspect, a test cell is provided that 
expresses a GPCR or other related protein with a mutant reporter enzyme, e^ a p- 
20 galactosidase mutant, and contains a protein from another pathway as a fusion 
protein with a complementing mutant reporter enzyme, e^, another p- 
galactosidase mutant. Increased reporter enzymatic activity indicates 
protein/protein interaction. 
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A further aspect of the invention is a method for monitoring homo- or 
hetero- dimerization of GPCRs upon agonist or antagonist stimulation. Increasing 
evidence indicates that GPCR dimerization is important for biological activity 
(AbdAlla, et al., "ATI-receptor heterodimers show enhanced G-protein activation 
5 and altered receptor sequestration." Nature, 407:94-98 (2000); Bockaert. et al,. 
"Molecular tinkering of G protein-coupled receptors: an evolutionary success." 
EMBOJ. 18:1723-29 (1999)). Jordan, et al. . "G-protein-coupled receptor 
heterodimerization modulates receptor function." Nature, 399:697-700 (1999), 
demonstrated that two non-functional opioid receptors, k and 5, heterodimerize to 
10 form a functional receptor. Gordon et al.. "Dopamine D2 receptor dimers and 

receptor blocking peptides." Bioch. Biophys. Res. Commun. 227:200-204 (1996), 
showed different pharmacological properties associated with the monomeric and 
dimeric forms of Dopamine receptor D2. The D2 receptors exist either as 
monomers that are selective targets for spiperone or as dimer forms that are targets 
1 5 for nemonapride. Herbert, et al.. "A peptide derived from a p2-adrenergic receptor 
transmembrane domain inhibits both receptor dimerization and activation." J.B.C. 
271 :16384-92 (1996), demonstrated that the agonist stimulation was found to 
stabilize the dimeric state of the receptor, whereas inverse agonists favored the 
monomeric form. Indeed, the same study showed that a peptide corresponding to 
20 the sixth transmembrane domain of the p2-adrenergic receptor inhibited both 

receptor dimerization and activation. Further, Angers et al.. Detection of beta-- 
adrenergic receptor dimerization in living cells using bioluminescence resonance 
energy transfer, Proc. Natl. Acad. Sci. USA, 97(7):3684-3689, discloses the use of 
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P2-adrenergic receptor fusion proteins fi.e., p2-adrenergic receptor fused to 
luciferase and p2-adrenergic receptor fused to an enhanced red-shifted green 
fluorescent protein) to study p2-adrenergic receptor dimerization. 

GPCR dimerization in the context of cellular physiology and 
5 pharmacology can be monitored in accordance with the invention. For example, p- 
galactosidase complementation can be measured in test cells that co-express GPCR 
fusion proteins of P-galactosidase mutant enzymes, e.g.. GPCR^a and GPCR 2 Ao> 
(FIGURE 27). According to this aspect, the interconversion between monomeric 
to dim eric forms of the GPCRs or orphan receptors can be measured by mutant 

1 0 reporter enzyme complementation. FIGURE 27 illustrates a test cell co-expressing 
GPCR or an orphan receptor as a fusion protein with Aa form of p-galactosidase 
mutant (e.g.. GPCR,Aa), and the same GPCR or orphan receptor as a fusion 
protein with Ag> form of p-galactosidase mutant (e.g., GPCR) Ao>). Formation of 
the GPCR homodimer is reflected by formation of an active enzyme, which can be 

15 measured by enzyme activity assays, such as the Gal-Screen™ assay. Similarly, 
hetero-dimerization between two distinct GPCRs, or two distinct orphan receptors, 
or between one known GPCR and one orphan receptor can be analyzed in test cells 
co-expressing two fusion proteins, e.g.. GPCR] Aa and GPCR 2 Ao. The increased 
P-galactosidase activity indicates that the two receptors can foim a heterodimer. 

20 A further aspect of the invention is a method of monitoring the 

interconversion between the monomeric and dimeric form of GPCRs under the 
influence of agonist or antagonist treatment. The test receptor(s) can be between 
the same GPCR or oiphan receptor (homodimer), or between two distinct GPCRs 
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or orphan receptors (heterodimer). The increased p-galactosidase activity after 
treatment with a compound means that the compound binds to and/or stabilizes the 
dimeric form of the receptor. The decreased p-galactosidase activity after 
treatment with a compound means that the compound binds to and/or stabilizes the 
5 monomelic form of the receptor. 

A further aspect of the invention is a method of screening a cell for the 
presence of a GPCR responsive to a GPCR agonist. A cell is provided that 
contains protein partners that interact downstream in the GPCR's pathway. The 
protein partners are expressed as fusion proteins to the mutant, complementing 

10 enzyme and are used to monitor activation of the GPCR. The cell is exposed to a 
GPCR agonist and then enzymatic activity of the reporter enzyme is detected. 
Increased reporter enzyme activity indicates that the cell contains a GPCR 
responsive to the agonist. 

The present invention involves the use of a combination of proprietary 

1 5 technologies (including ICAST™, Intercistronic Complementation Analysis 

Screening Technology, Gal-Screen™, etc.) to monitor protein/protein interactions 
in GPCR signaling. As disclosed in U.S. Application Serial No. 09/654,499, the 
method of the invention in part involves using ICAST™, which in turn involves 
the use of two inactive P-galactosidase mutants, each of which is fused with one of 

20 two interacting target protein pairs, such as a GPCR and an arrestin. The formation 
of an active p-galactosidase complex is driven by interaction of the target proteins. 
In this system, P-galactosidase activity can be detected using, e.g., the Gal- 
Screen™ assay system, wherein direct cell lysis is combined with rapid 
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ultrasensitive chemiluminescent detection of P-galactosidase reporter enzyme. 
This system uses, e^ a Galacton-ftar® chemiluminescent substrate for 
measurement in a luminometer as a read out of GPGR activity. 

FIGURE 23 is a schematic depicting the use of the complementation 
5 technology in the method of the present invention. FIGURE 23 shows two inactive 
P-galactosidase mutants that become active when they are forced together by 
specific interactions between the fusion partners of an arrestin molecule and an 
activated GPCR or orphan receptor. This assay technology will be especially 
useful in high throughput screening assays for ligand fishing for orphan receptors, a 
1 0 process called de-orphaning. As illustrated in FIGURE 28, a P-galactosidase 

fusion protein of an orphan receptor (e.g.. GPCR^^Aa) is co-expressed in the test 
cell with a fusion protein of p-arrestin (e^ p-AirAo>). When the test cell is 
subjected to compounds, which could be natural or synthetic, the increased P- 
galactosidase activity means the compound is either a natural or surrogate ligand 
15 for this GPCR. The same assay system can be used to find drug leads for the new 
GPCRs. The increased p-galactosidase activity in the test cell after treatment 
indicates the agonist activity of the compound. The decreased P-galactosidase 
- - activity in the test cell indicates antagonist activity or inverse agonist activity of the 
compound. In addition, the method of the invention could be used to monitor 
20 GPCR-mediated signaling pathways via other downstream signaling components 
such as G-proteins, GRKs or the proto-oncogene c-Src. 

The invention is achieved in part by using ICAST™ protein/protein 
interaction screening to map signaling pathways. This technology is applicable to 
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a variety of known and unknown GPCRs with diverse functions. They include, but 
are not limited to, the following sub-families of GPCRs: 

(a) receptors that bind to amine-like ligands-Acetylcholine muscarinic 
receptor (Ml to M5), alpha and beta Adrenoceptors, Dopamine receptors (Dl, D2, 

5 D3 and D4), Histamine receptors (HI and H2), Octopamine receptor and Serotonin 
receptors (5HT1, 5HT2, 5HT4, 5HT5, 5HT6, 5HT7); 

(b) receptors that bind to a peptide ligand-Angiotensin receptor, Bombesin 
receptor, Bradykinin receptor, C-C chemokine receptors (CCR1 to CCR8, and 
CCR10), C-X-C type Chemokine receptors (CXC-R5), Cholecystokinin type A 

10 receptor, CCK type receptors, Endothelin receptor, Neurotesin receptor, FMLP- 
related receptors, Somatostatin receptors (type 1 to type 5) and Opioid receptors 
(typeD,K,M,X); 

(c) receptors that bind to hormone proteins-Follic stimulating hormone 
receptor, Thyrotrophin receptor and Lutropin-choriogonadotropic hormone 

15 receptor; 

(d) receptors that bind to neurotransmitters-substance P receptor, 
Substance K receptor and neuropeptide Y receptor; 

(e) Olfactory receptors-Olfactory type 1 to type 11, Gustatory and odorant 
receptors; 

20 (f) Prostanoid receptors-Prostaglandin E2 (EP1 to EP4 subtypes), 

Prostacyclin and Thromboxane; 

(g) receptors that bind to metabotropic substances-Metabotropic glutamate 
group I to group m receptors; 
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(h) receptors that respond to physical stimuli, such as light, or to chemical 
stimuli,such as taste and smell; and 

(i) orphan GPCRs-the natural ligand to the receptor is undefined. 
Use of the ICAST™ technology in combination with the invention 

5 provides many benefits to the GPCR screening process, including the ability to 
monitor protein interactions in any sub-cellular compartment-membrane, cytosol 
and nucleus; the ability to achieve a more physiologically relevant model without 
requiring protein overexpression; and the ability to achieve a functional assay for 
receptor binding allowing high information content. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIGURE 1. Cellular expression levels of p2 adrenergic receptor (p2AR) 
and p-arrestin-2 (p Arr2) in C2 clones. Quantification of p-galactosidase (P-gal) 
fusion protein was performed using antibodies against P-gal and purified p-gal 

15 protein in a titration curve by a standardized ELISA assay. Figure 1 A shows 

expression levels of p2AR-PgalAa clones (in expression vector pICAST ALC). 
Figure IB shows expression levels of pArr2-pgalAco in expression vector pICAST 
OMC4 for clones 9-3, -7, -9, -1 0, -1 9 and -24, or in expression vector pICAST 
OMN4 for clones 12-4, -9, -16, -18, -22 and -24. 

20 FIGURE 2. Receptor P2AR activation was measured by agonist-stimulated 

cAMP production. C2 cells expressing pICAST ALC P2AR (clone 5) or parental 
cells were treated with increasing concentrations of (-)isoproterenol and O.lmM 
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IBMX. The quantification of cAMP level was expressed as pmol/well. 

FIGURE 3. Interaction of activated receptor p2AR and arrestin can be 
measured by p-galactosidase complementation. Figure 3A shows a time course of 
P~galactosidase activity in response to agonist (-)isoproterenol stimulation in C2 
5 expressing p2AR-PgalAcc (p2AR alone, in expression vector pICAST ALC), or a 
pool of doubly transduced C2 co-expressing p2AR-PgalAa and pAn2-PgalAco (in 
expression vectors pICAST ALC and pICAST OMC and clones isolated from the 
same pod (43-1, 43-2, 43-7 and 43-8)). Figure 3B shows a time course of p~ 
galactosidase activity in response to agonist (^isoproterenol stimulation in C2 cells 
1 0 expressing p2AR-pgalAa alone (in expression vector pICAST ALC) and C2 clones 
co-expressing p2AR-pgalA<x and pArrl-PgalAoo (in expression vectors ICAST 
ALC and pICAST OMC). 

FIGURE 4. Agonist dose response for interaction of P2AR and arrestin can 
be measured by p-galactosidase complementation. Figure 4A shows a dose 
15 response to agonists (-)isoproterenol and procaterol in C2 cells co-expressing 
p2AR-pgalAa and pArr2-pgalA© fusion constructs. Figure 4B shows a dose 
response to agonists (-)isoproterenol and procaterol in C2 cells co-expressing 
p2AR-PgalAa and PArrl-PgalA© fusion constructs. 

FIGURE 5. Antagonist mediated inhibition of receptor activity can be 
20 measured by P-galactosidase complementation in cells co-expressing P2AR- 
pgalAct and pArr-PgalAco. Figure 5A shows specific inhibition with adrenergic 
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antagonists ICI-1 18,551 and propranolol of P-galactosidase activity in C2 clones 
co-expressing p2AR-PgalAa and PArr2-PgalAa> fusion constructs after incubation 
with agonist (^isoproterenol. Figure 5B shows specific inhibition of P- 
galactosidase activity with adrenergic antagonists ICI-1 18,551 and propranolol in 
5 C2 clones co-expressing P2AR-PgalAa and pArrl-pgalA© fusion constructs in the 
presence of agonist (^isoproterenol. 

FIGURE 6. C2 cells expressing adenosine receptor A2a show cAMP 
induction in response to agonist (CGS-21680) treatment. C2 parental cells and C2 
cells co-expressing A2aR-pgalAct and pArrl-pgalA© as a pool or as selected clones 

10 (47-2 and 47-13) were measured for agonist-induced cAMP response (pmol/well). 

FIGURE 7. Agonist stimulated cAMP response in C2 cells co-expressing 
Dopamine receptor Dl (Dl-PgalAa) and p-arrestin-2 (pArr2-PgalA<o). The clone 
expressing pArr2-PgalA© (Arr2 alone) was used as a negative control in the assay. 
Cells expressing Dl-pgalAa in addition to pArr2-PgalAot) responded agonist 

15 treatment (3-hydroxytyramine hydrochloride at 3 yM). D1(PIC2) or D1(PIC3) 

designate Dl in expression vector pICAST ALC2 or pICAST ALC4, respectively. 

FIGURE 8. Variety of mammalian cell lines can be used to generate stable 
cells for monitoring GPCR and arrestin interactions. FIGURE 8A, FIGURE 8B and 
FIGURE 8C show the examples of HEK 293, CHO and CHW cell lines co- 

20 expressing adrenergic receptor p2AR and arrestin fusion proteins of p- 
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galactosidase mutants. The p-galactosidase activity was used to monitor agonist- 
induced interaction of 02AR and arrestin proteins. 

FIGURE 9. Beta-gal complementation can be used to monitor P2 
adrenergic receptor homo-dimerization. FIGURE 9A shows p-galactosidase 
5 activity in HEK 293 clones co-expressing P2AR-PgalAa and p2AR-pgalAco. 
FIGURE 9B shows a cAMP response to agonist (-)isoproterenol in HEK 293 
clones co-expressing p2AR-pgalAa and p2AR-pgalAa). HEK293 parental cells 
were included in the assays as negative controls. 

FIGURE 10A. pICAST ALC: Vector for expression of P-galAa as a C- 
10 terminal fiision to the target protein. This construct contains the following 

features: MCS, multiple cloning site for cloning the target protein in frame with the 
P-galAa; GS Linker, (GGGGS)n; NeoR, neomycin resistance gene; IRES, internal 
ribosome entry site; ColElori, origin of replication for growth in E. coli; 
5'MoMuLV LTR and 3'MoMuLV LTR, viral promoter and polyadenylation 
15 signals from the Moloney Murine leukemia virus. 

FIGURE 10B. Nucleotide sequence for pICAST ALC. 
FIGURE 1 1 A. pICAST ALN: Vector for expression of p-galAa as an N- 
terminal fusion to the target protein. This construct contains the following 
features: MCS, multiple cloning site for cloning the target protein in frame with the 
20 p-galAa; GS Linker, (GGGGS)n; NeoR, neomycin resistance gene; IRES, internal 
ribosome entry site; ColElori, origin of replication for growth in E. coli; 
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5'MoMuLV LTR and 3'MoMuLV LTR, viral promoter and polyadenylation 
signals from the Moloney Murine leukemia virus. 

FIGURE 1 IB. Nucleotide sequence for pICAST ALN. 
FIGURE 12A. pICAST OMC: Vector for expression of p-galAoo as a C- 
5 terminal fusion to the target protein. This construct contains the following 

features: MCS, multiple cloning site for cloning the target protein in frame with the 
P-galA© ; GS Linker, (GGGGS)n; Hygro, hygromycin resistance gene; IRES, 
internal ribosome entry site; ColElori, origin of replication for growth in E. coli; 
5'MoMuLV LTR and 3'MoMuLV LTR, viral promoter and polyadenylation 
1 0 signals from the Moloney Murine leukemia virus. 

FIGURE 12B. Nucleotide sequence for pICAST OMC. 
FIGURE 1 3 A. pICAST OMN: Vector for expression of P-galAa) as an N- 
tenninal fusion to the target protein. This construct contains the following 
features: MCS, multiple cloning site for cloning the target protein in frame with the 
15 P-galAco; GS Linker, (GGGGS)n; Hygro, hygromycin resistance gene; IRES, 

internal ribosome entry site; ColElori, origin of replication for growth in E. coli; 
5'MoMuLV LTR and 3'MoMuLV LTR, viral promoter and polyadenylation 
signals from the Moloney Murine leukemia virus. 

FIGURE 13B. Nucleotide sequence for pICAST OMN. 
20 FIGURE 14. pICAST ALC PArr2: Vector for expression of p-galAa as a 

C-terminal fusion to p-arrestin-2. The coding sequence of human P-arrestin-2 
(Genebank Accession Number: NM_004313) was cloned in frame to P-galAa in a 
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pICAST ALC vector. 

FIGURE 15. pICAST OMC PAit2: Vector for expression of p-galAco as a 
C-terminal fusion to P-arrestin-2. The coding sequence of human p-arrestin-2 
(Genebank Accession Number: NM_0043 13) was cloned in frame to P-galA© in a 
5 pICAST OMC vector. 

FIGURE 1 6. pICAST ALC PArrl : Vector for expression of P-galAa as a 
C-terminal fusion to P-arrestin-1. The coding sequence of human P-arrestin-1 
(Genebank Accession Number: NM_004041) was cloned in frame to p-galAa in a 
pICAST ALC vector. 

10 FIGURE 17. pICAST OMC pArrl : Vector for expression of P-galA© as a 

C-terminal fusion to P-arrestin-1. The coding sequence of human p-arrestin-1 
(Genebank Accession Number: NM J)0404 1) was cloned in frame to p-galA© in a 
pICAST OMC vector. 

FIGURE 18. pICAST ALC p2AR: Vector for expression of p-galAa as a 
15 C-terminal fusion to P2 Adrenergic Receptor. The coding sequence of human P2 
Adrenergic Receptor (Genebank Accession Number: NM_000024) was cloned in 
frame to p-galAa in a pICAST ALC vector. 

FIGURE 19. pICAST OMC p2AR: Vector for expression of p-galAo> as a 
C-terminal fusion P2 Adrenergic Receptor. The coding sequence of human P2 
20 Adrenergic Receptor (Genebank Accession Number: NM_000024) was cloned in 
frame to P-galAco in a pICAST OMC vector. 
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FIGURE 20. pICAST ALC A2aR: Vector for expression of p-galAa as a 
C-terminal fusion to Adenosine 2a Receptor. The coding Sequence of human 
Adenosine 2a Receptor (Genebank Accession Number: NK1_000675) was cloned 
in frame to p-galAa in a pICAST ALC vector. 
5 FIGURE 21. pICAST OMC A2aR: Vector for expression of p-galAco as a 

C-terminal fusion to Adenosine 2a Receptor. The coding sequence of human 
Adenosine 2a Receptor (Genebank Accession Number: NM_000675) was cloned 
in frame to P-galAco in a pICAST OMC vector. 

FIGURE 22. pICAST ALC Dl: Vector for expression of P-galAa as a C- 
10 terminal fusion to Dopamine Dl Receptor. The coding sequence of human 

Dopamine Dl Receptor (Genebank Accession Number: X58987) was cloned in 
frame to p-galAa in a pICAST ALC vector. 

FIGURE 23. A schematic depicting use of the complementation 
technology in the method of the invention. FIGURE 23 shows two inactive 
1 5 mutant reporter enzymes that become active when the corresponding fusion 
partners, GPCR and P-arrestin interact. 

FIGURE 24. Vector for expression of a GPCR with inserted 
seronine/threonine amino acid sequences as a fusion with P~galAa. The open 
reading frame of a known or orphan GPCR is engineered to contain additional 
20 seronine/threonine sequences, such as SSS (seronine, seronine, seronine), within 
the C-terminal tail. The engineered GPCR is cloned in frame with P-galAa in a 
pICAST ALC vector. The pICAST ALC vector contains the following features: 
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MCS, multiple cloning site for cloning the target protein in frame with the p-galAa; 
GS Linker, (GGGGS)n; NeoR, neomycin resistance gene; IRES, internal ribosome 
entry site; ColE 1 ori, origin of replication for growth in E. coli; 5MoMuLV LTR 
and 3*MoMuLV LTR, viral promoter and polyadenylation signals from the 
5 Moloney Murine leukemia virus. 

FIGURE 25. Vector for expression of mutant (R170E) P-arrestin2 as a 
fiision with p-galAw. The open reading frame of P-arrestin2 is engineered to 
contain a point mutation that converts arginine 170 to a glutamate. The mutant p- 
arrestin2 is cloned in frame with p-galAw in a pICAST OMC vector. The pICAST 
1 0 OMC vector contains the following features: MCS, multiple cloning site for 
cloning the target protein in frame with the p-galAa; GS Linker, (GGGGS)n; 
Hygro, hygrbmycin resistance gene; IRES, internal ribosome entry site; ColElori, 
origin of replication for growth in E. coli; 5'MoMuLV LTR and 3*MoMuLV LTR, 
viral promotor and polyadenylation signals from the Moloney Murine leukemia 
15 virus. 

FIGURE 26. Phosphorylation insensitive Mutant R170E P-Arrestin2Aw 
binds to p2ARAo in Response to Agonist Activation. A parental p2ARAa C2 cell 
line was tranduced with the Mutant Rl 70E p-Arrestin2A(o construct. Clonal 
populations co-expressing the two constructions were plated at 10,000 cellsAvell in 
10 96 well plates and treated with 1 OuM (^isoproterenol, 0.3mM ascorbic acid for the 
indicated time period, p-galactosidase activity was measured by addition of Tropix 
Gal-Screen™ assay system substrate (Applied Biosystems) and luminescence was 
measured using a Tropix TR717™ luminometer (Applied Biosystems). Treatments 
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were performed in triplicate. For comparison, a clonal cell line (43-8) co- 
expressing p2ARAa and wild-type P- Arresting Ao> was also plated at 10,000 
cells/well and given the same agonist treatment regimen. Minutes of 
(-)isoproterenol treatment is shown on the X-axis and p-galactosidase activity 
5 indicated by relative light units (RLU) is shown on the Y-axis. 

FIGURE 27. GPCR dimerization measured by p-galactosidase 
complementation. A schematic depicting the utilization of the invention for 
monitoring GPCR homo- or hetero- dimerization. One GPCR is fused to one 
complement enzyme fragment, while the second GPCR is fused to the second 
1 0 complement enzyme fragment. Interaction of the two GPCRs is monitored by 

complementation of the enzyme fragments to produce an active enzyme complex 
(i.e., P-galactosidase activity). GPCR homo- or hetero- dimerization can be 
monitored in the absence or presence of ligand, agonists, inverse agonists or 
antagonists. 

1 5 FIGURE 28. Ligand fishing for orphan receptors by P-galactosidase 

mutant complementation in ICAST™ system. A schematic depicting the 
utilization of the invention for ligand fishing and agonist/antagonist screening for 
orphan GPCRs. As an example, a test cell expressing two p-gal fusion proteins, 
GPCRojphan.Aa and Arrestin-Aco, is subjected to treatments with samples from 

20 natural or synthetic compound libraries, or from tissue extracts, or from 

conditioned media of cultured cells. An increased P-gal activity after treatment 
indicates the activation of the orphan receptor by a ligand in the testing sample. 
The readout of increased p-gal activity reflects the interaction of an activated 
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GPCR oiphan receptor with a P-arrestin. Therefore, a cognate or a surrogate ligand 
for the testing receptor is identified. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

5 The present invention provides a method to interrogate GPCR function and 

pathways. The G-protein-coupled superfamily continues to expand rapidly as new 
receptors are discovered through automated sequencing of cDNA libraries or 
genomic DNA. It is estimated that several thousand GPCRs may exist in the 
human genome. Only a portion have been cloned and even fewer have been 
1 0 associated with ligands. The means by which these, or newly discovered orphan • 
receptors, will be associated with their cognate ligands and physiological functions 
represents a major challenge to biological and biomedical research. The . 
identification of an orphan receptor generally requires an individualized assay and 
a guess as to its function. The present invention involves the interrogation of 
1 5 GPCR function by monitoring the activation of the receptor using activation 

dependent protein-protein interactions between the test GPCR or oiphan receptor 
and a p-arrestin. The specific protein-protein interactions are measured using the 
mutant enzyme complementation technology disclosed herein. This assay system 
eliminates the prerequisite guessing because it can be performed with and without 
20 prior knowledge of other signaling events. It is sensitive, rapid and easily 

performed and is applicable to nearly all GPCRs because the majority of these 
receptors desensitize by a common mechanism. 

The present invention provides a complete assay system for monitoring 
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protein-protein interactions in GPCR pathways. The invention employs the 
complementation technology, ICAST™ (Intercistronic Complementation Analysis 
Screening Technology as disclosed in pending U.S. patent application serial no. 
053,614, filed April 1, 1998, the entire contents of which are incorporated herein 
5 by reference). The ICAST™ technology involves the use of two mutant forms of a 
reporter enzyme fused to proteins of interest. When the proteins of interest do not 
interact, the reporter enzyme remains inactive. When the proteins of interest do 
interact, the reporter enzyme mutants come together and form an active enzyme. 
According to an embodiment of the invention, the activity of p-galactosidase may 
10 be detected with the Gal-Screen™ assay system developed by Advanced Discovery 
Sciences™, which involves the use of Galacton-&ar<g>, an ultrasensitive 
chemiluminescent substrate. The Gal-Screen™ assay system and the Galacton- 
Star® chemiluminescent substrate are disclosed in U.S. Patent Nos. 5,851,771; 
5,538,847; 5,326,882; 5,145,772; 4,978,614; and 4,931,569, the contents of which 
15 are incorporated herein by reference in their entirety. The invention provides an 
array of assays, including GPCR binding assays, that can be achieved directly 
within the cellular environment in a rapid, non-radioactive assay format; The 
methods of the invention are an advancement over the invention disclosed in U.S. 
Patent Nos. 5,891,646 and 6,1 10,693 and the method disclosed in Angers et al„ 
20 supra., which rely on microscopic imaging or spectrometry of GPCR components 
as fusion with Green-fluorescent-protein. The imaging technique disclosed in U.S. 
Patent Nos. 5,891,646 and 6,1 10,693 and spectrometry-based technique in Angers 
etal. are limited by low-throughput and lack of thorough quantification. 
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The assay system of the invention combined with Advanced Discovery 
Sciences™ technologies provide highly sensitive cell-based methods for 
interrogating GPCR pathways which are amenable to high-throughput screening 
(HTS). Among some of the technologies developed by Advanced Discovery 
5 Sciences™ that may be used with the present invention are the Gal-Screen™ assay 
system (discussed above) and the cAMP-Screen™ immunoassay system. The 
cAMP-Screen™ immunoassay system provides ultrasensitive determination of 
cAMP levels in cell lysates. The cAMP-Screen™ assay utilizes the high-sensitivity 
chemiluminescent alkaline phosphatase (AP) substrate CSPD® (disodium 3-(4- 
10 methoxyspiro {l,2-dioxetane-3,2 ? -(5*-chloro) tricyclo 3.3.1. l. 3 ' 7 }decan-4-yl phenyl 
phosphate) with Sapphire-II™ luminescence enhancer. 

Unlike yeast-based-two-hybrid assays used to monitor protein/protein 
interactions in high-throughput assays, the present invention (1) is applicable to a 
variety of cells including mammalian cells, plant cells, protozoa cells such as E. 
15 coli and cells of invertebrate origin such as yeast, slime mold (Dictyostelium) and 
insects; (2) detects interactions at the membrane at the site of the receptor target or 
in the cytosol at the site of downstream target proteins rather than a limited cellular 
localization, i.e., nucleus; and (3) does not rely on indirect read-outs such as 
transcriptional activation. The present invention thus provides assays with greater 
20 physiological relevance and fewer false positives. 

the present inventors have developed modifications to the embodiment 
disclosed in U.S. patent application serial no. 053,614 described above in order to 
enhance the sensitivity of the inventive GPCR assay. According to an 
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embodiment, the invention incorporates the use of serine/threonine clusters to 
enhance and prolong the interaction of GPCR with arrestin in order to make the 
detection more robust. The clusters can be utilized for oiphan receptors or known 
GPCRs, which do not have this sequence motif By adding this sequence to the C- 
5 terminal tail of the receptor, the activation of the receptor can be detected more 
readily by readouts of arrestin binding to GPCR, i.^ P-galactosidase 
complementation from fusion proteins of target proteins with p-galactosidase 
mutants. 

According to another embodiment, the invention incorporates the use of 
1 0 arrestin point mutations to bypass the requirement of phosphorylation, by the 

action of specific GRK, on the C-terminal tail or intracellular loops of GPCR upon 
activation. The applications include i) wherein the cognate GRK for a particular 
GPCR or orphan receptor is unknown; and ii) wherein the specific GRK for the 
receptor of interest (or under test) may not be present or may have low activity in 
15 the host cell that is used for receptor activation assay. 

According to another embodiment, the invention incorporates the use of a 
super arrestin to increase the binding efficiency of arrestin to an activated GPCR 
and to stabilize the GPCR/arrestin complex during GPCR desensitization. This 
application can be used to increase the robustness of ICAST/GPCR applications in 
20 cases where the GPCR is normally resensitized rapidly post desensitization* 
Each of these methodologies is discussed below. 
The invention will now be described in the following non-limiting 
examples. 
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EXAMPLE: 

According to an embodiment of the invention, GPCR activation is 
measured through monitoring the binding of arrestin to ligand-activated GPCR. In 
this assay system, a GPCR, e^ , p-adrenergic receptor (P2AR), and an arrestin, 
5 e.g., p-arrestin, are co-expressed in the same cell as fusion proteins with mutant 
forms of a reporter enzyme, e.g., p-galactosidase (P-gal). As illustrated in Figure 
23, the f32AR is expressed as a fusion protein with Aa form of p-gal mutant 
(P2ARAot) and the p-arrestin as a fusion protein with the A© form of p-gal mutant 
(P-ArrAo). The two fusion proteins, which at first exist in a resting (or un- 

1 0 stimulated) cell in separate compartments, le^ the membrane for GPCR and the 
cytosol for arrestin, cannot form an active p-galactosidase enzyme. When such a 
cell is treated with an agonist or a ligand, the ligand-occupied and activated 
receptor becomes a high affinity binding site for arrestin. The interaction between 
an activated GPCR, p2ARAa, and arrestin, P-ArrAo), drives the p-gal mutant 

15 complementation. The enzyme activity can be measured by using an enzyme 
substrate, which upon cleavage releases a product measurable by colorimetry, 
fluorescence, or chemiluminescence (e.g.. the Gal-Screen™ assay system). 

Experiment protocol- 

20 1 . In the first step, the expression vectors for P2ARAa and pAir2Aco were 

engineered in selectable retroviral vectors pICAST ALC, as described in Figure 18 
and pICAST OMC, as described in Figure 15. 
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2. In the second step, the two expression constructs were transduced into 
either C2C12 myoblast cells, or other mammalian cell lines, such as COS-7, CHO, 
A43 1 , HEK 293, and CHW. Following selection with antibiotic drugs, stable 
clones expressing both fusion proteins at appropriate levels were selected. 
5 3. In the last step, the cells expressing both p2ARAa and pAn2Aco were 

tested for response by agonist/ligand stimulated P-galactosidase activity. Triplicate 
samples of cells were plated at 10,000 cells in 100 microliter volume into a well of 
96-well culture plate. Cells were cultured for 24 hours before assay. For agonist 
assay (Figures 3 and 4), cells were treated with variable concentrations of agonist, 

10 for example, (-) isoproterenol, procaterol, dobutamine, teibutaline or L-L- 

phenylephrine for 60 min at 37° C. The induced P- galactosidase activity was 
measured by addition of Tropix Gal-Screen™ assay system substrate (Applied 
Biosystems) and luminescence measured in a Tropix TR717™ luminometer 
(Applied Biosystems). For antagonist assay (Figure 5), cells were pre-incubated for 

15 10 min in fresh medium without serum in the presence of ICI-1 1 8,55 1 or 
propranolol followed by addition of 10 micro molar (-) isoproterenol. 

Serine/Threonine Cluster Stratejgy ' 
Background 

20 Based on structure-function relationship studies on p-arrestins, a large 

region within the amino-terminal half of P-arrestins (termed the activation- 
recognition domain) recognizes the agonist-activated state of GPCRs. This region 
of P-arrestin also contains a small positively charged domain (approximately 20 
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amino acids with net charge +7) called the phosphorylation-recognition domain, 
which appears to interact with the GRK-phosphorylated carboxyl termini of 
GPCRs. 

GPCRs can be divided into two classes based on their affinities for p- 
5 arrestins. Oakley et al.. "Association of p-Arrestin with G Protein-Coupled 

Receptors During Clathrm-Mediated Endocytosis Dictates the Profile of Receptor 
Resensitization." J. Biol. Chem., 274(45):32248-32257 (1999). The molecular 
determinants underlying this classification appear to reside in specific serine or 
threonine residues located in the carboxyl-terminal tail of the receptor. The 

10 receptor class that contains serine/threonine clusters (defined as serine or threonine 
residues occupying three consecutive or three out of four positions) in the 
carboxyl-termini binds p-arcestin with high affinity upon activation and 
phosphorylation and remains bound with p-arrestin even after receptor 
internalization, whereas the receptor class that contains only scattered serine and 

15 threonine residues in the carboxy-tenninal tail binds p-arrestins with less affinity 
and disassociates from the P-arrestin upon internalization. Several known GPCRs, 
such as vasopressin V2 receptor ( Oaklev. et al.\ neurotensin receptor 1 and 
angiotensin II receptor type 1 A (Zhang, et al. "Cellular Trafficking of G Protein- 
Coupled Receptor/p-ArrestinEndocytic Complexes." J. Biol. Chem., 

20 274(1 6): 10999-1 1006 (1999)), which possess one or more of such serine/threonine 
clusters in their carboxyl-termini, were shown to bind P-arrestins with high 
affinity. 

-32- 



WO 01/58923 



PCT/US01/00684 



EXAMPLE 

According to an embodiment of the invention, a serine/threonine cluster 
strategy is used to facilitate screening assays for orphan receptors that do not 
possess this structural motif of their own. The oiphan receptors are easily classified 
5 by sequence alignment. Orphan receptors lacking the serine/threonine clusters are 
each cloned into an expression vector that is modified to introduce one or more 
serine/threonine clusters) to the carboxyl-terminal tail of the receptor (FIGURE 
24). The serine/threonine clusters enhance the receptor activation dependent 
interaction between the activated and phosphorylated receptor (negative charges) 
10 and p-arrestin (positive charges in the phosphorylation-recognition domain) 

through strong ionic interactions, thus prolonging interaction between the receptor 
and arrestin. The modification of the orphan receptor tail thus makes detection of 
receptor activation more robust. 

15 Experiment protocol - 

1. In a first step, the open-reading-frame (ORF) of an orphan receptor, 
which lacks the serine/threonine clusters, is cloned into a modified expression 
vector such as pICAST ALC described in Figure 10A. The modified pICAST ALC 
includes coding sequences for one or more sets of serine/threonine clusters (for 
20 example, SSS or SST) located downstream from the insert of the ORF of an orphan 
receptor (FIGURE 24). 

2. In a second step, chimeric oiphan receptor, ORF otphanR -(SSS) n -Aa, is co- 
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expressed in a mammalian cell with a P-arrestin chimera, such as 
PAjt2Ag> described in Figure 15. 

3. In a third step, the cell is treated with an agonist or a ligand and the 
activated receptor with phosphorylated serine clusters) binds the p-arrestin with 
5 high affinity producing strong signals in readouts of P-gal complementation. 

This assay, which provides a means for sensitive measurement of functional 
activation of the orphan receptors, can be used to screen for natural or surrogate 
ligands for orphan receptors, a process called de-oiphaning or target discovery for 
new GPCRs (FIGURE 28). Furthermore, this assay is also useful in screening for 
1 0 potential agonists and antagonists for lead discovery of GPCRs. 

Enhanced Binding of Arrestin in the Presence and in the Absence of GPCR 

Phosphorylation 

Background 

1 5 Six different classes of G-protein coupled receptor kinases (GRKs) have 

been identified and each of these has been reported to be expressed as multiple 
splice variants. Krupnick et ah, "The role of receptor kinases and arrestins in G 
protein-coupled receptor regulation." Ann. Rev. Pharmacol. Toxicol, 38:289-319 
(1998). Although many cell lines express a variety of GRKs, the specific GRK 

20 required for phosphorylation of a given GPCR may not always be present in the 
cell line used for recombinant GPCR and arrestin expression. This is particularly 
an issue for applications using orphan receptors, in which case the cognate GRK 
will likely be unknown. In other cases, the cell line used for recombinant 
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expression work may have the required GRK, but may express the GRK at low 
levels. In order to bypass such caveats, genetically modified arrestins that bind 
specifically to activated GPCRs, but without the requirement of GRK 
phosphorylation are employed. 
5 Mutagenesis studies on arrestins demonstrate that point mutations in the 

phosphorylation-recognition domain, particularly mutations converting Argl75 (of 
visual arrestin) to an oppositely charged residue such as glutamate (R175E 
mutation), result in an arrestin which specifically binds to activated GPCRs, but 
does so without the requirement for phosphorylation. 
10 Numerous observations have led to the hypothesis that arrestin exists in an 

inactive state that has a low affinity for GPCRs. Once a GPCR is both activated 
and phosphorylated, the phosphorylated region of the GPCR C-terminus interacts 
with the phosphorylation-recognition domain of aiTestin causing the arrestin to 
change conformations allowing the activation-recognition region to be exposed for 
1 5 binding to the activated/ phosphorylated receptor. Vishmvetskiv et al.. 'Bow does 
arrestin respond to the phosphorylated state of rhodopsin?" J. Biol. Chem., 
274(17):1 1451-1 1454 (1999); Gurevich et al.. "AiTestin interactions with G 
protein-coupled receptors. Direct binding studies of wild-type and mutant arrestins 
with rhodopsin, beta 2-adrenergic and m2 muscarinic cholinergic receptors." J. 
20 Biol. Chem., 270(2):720-731, (1995); Gurevich et al.. "Mechanism of 

phosphorylation-recognition by visual arrestin and the transition of arrestin into a 
high affinity binding site." Mol. Pharmacol., 51(1):161-169 (1997); Kovooretal.. 
'Targeted construction of phosphorylation-independent beta-arrestin mutants with 
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constitutive activity in cells." J. Biol. Chem., 274(1 1):6831 -6834 (1999). In 
summary, binding studies of single mutation, double mutation, deletion, and 
chimerical arrestins with inactive, inactive and phosphorylated, activated but not 
phosphorylated, or activated and phosphorylated visual or non- visual GPCRs all 
5 support this model. 

EXAMPLE 

A phosphorylation insensitive mutant of arrestin fused to mutant reporter 
protein can be produced that will bind to activated GPCRs in a phosphorylation 
10 independent manner. As proof of concept, a point mutation for 0-airestin2, R170E 
P-arrestin2, has been produced and its interaction with 02AR has been analyzed in 
accordance with the invention. 

Experimental protocol: 

15 1) In the first step, P-arrestin2 was mutated such that Argl 70 was converted to 
Glu. This mutation is equivalent to the R175E mutation of visual arrestin. The 
mutant p-arrestin2 open reading frame was cloned in frame with Ao>-p- 
galactosidase in the pICAST OMC expression vector to produce a modified 
expression vector R170E p-arrestin2 (FIGURE 25). 

20 2) In the second step, the Rl 70E p-arrestin2 expression construct was 

transduced into a C2C12 myoblast cell line that had been engineered to express 
p2AR as a fusion to Aa-P-galactosidase as described in Figure 18 of U.S. 
Application Serial No. 09/654,499. Following selection with antibiotic drugs, a 
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population of clones expressing both fusion proteins was obtained. 
3) In the last step, this population of cells expressing both R170E p- 
aiTestin2Ao) and P2ARAa were tested for response by agonist/ligand stimulated P- 
galactosidase activity as demonstrated in FIGURE 26. The C2C12 clone 43-8 co r 
5 expressing p2ARA<x and wild-type P-arrestin2Ao> (FIGURE 26) was used as 

reference control. Triplicate samples of cells were plated at 10,000 cells in 100 
microliter volume into wells of a 96-well culture plate. Cells were cultured for 24 
hours before assay. For agonist assay as in FIGURE 26, cells were treated with 
10nm (-)isoproterenol stabilized with 0.3mM ascorbic acid 37° C for 0, 5, 10, 15, 
10 30, 45 or 60 minutes. The induced p-galactosidase activity was measured by 

addition of Tropix Gal-Screen™ assay system substrate (Applied Biosystems) and 
luminescence measured in a Tropix TR717™ luminometer (Applied Biosystems). 
As shown in Figure 26, the mutant arrestin interacts with p2AR in an agonist- 
dependent manner and was comparable with that of wild-type arrestin. 
15 4) To expand the application of phosphorylation-insensitive arrestin, cell lines 
such as C2C12, CHO or HEK 293, are developed that express the R170E p- 
arrestin2Aa> construction. These cell lines can be used to transduce orphan or 
known GPCRs as fiisions with Aa-P-galactosidase in order to develop cell lines for 
agonist and antagonist screening and 
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. Development of Super Arrestins: 
Background 

Attenuation of GPCR signaling by the arrestin pathway serves to ensure 
that a cell or organism does not over-react to a stimulus. At the same time, the 
5 arrestin pathway often serves to recycle the GPCR such that it can be temporarily 
inactivated but then quickly resensitized to allow for sensitivity to new stimuli. 
The down-regulation process involves phosphorylation of the receptor, binding to 
arrestin and endocytosis. Following endocytosis of the desensitized receptor, the 
receptor is either degraded in lysosomes or resensitized and sent back to the 

1 0 membrane. Resensitization involves release of arrestin from the receptor, 

dephosphorylation and cycling back to the membrane. The actual route a GPCR 
follows upon activation depends on its biological function and the needs of the 
organism. Because of these diverse pathways that may be required of the down- 
regulation pathway, arrestin affinities for activated GPCRs vary from receptor to 

1 5 receptor. It would thus be very advantageous to engineer super arrestins that have 
a higher affinity and avidity for activated GPCRs than what nature has provided. 

Although mutational, deletion and chimerical studies of arrestins have 
focused on understanding regulatory switches in the molecule that respond to 
GPCR phosphorylation states, several of these altered recombinant forms of 

20 arrestin have resulted in molecules with enhanced binding to activated, 
phosphorylated GPCRs. Conversion of Argl75 to histidine, tyrosine, 
phenylalanine or threonine results in significantly higher amounts of binding to 
phosphorylated, activated rhodopsin than wild-type arrestin or R175E arrestin, 
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although these mutations result in less binding to activated, non-phosphorylated 
receptor. Gurevich et al. (1997). In addition, conversion of Valine 170 to alanine 
increased the constitutive affect of the R175E mutation, but also nearly doubled the 
amount of interaction of wild-type arrestin with activated, phosphorylated 
5 rhodopsin. Gurevich et al. (1997V 

Truncation of P-arrestinl at amino acid 382 has been reported to enhance 
binding of both R169E (equivalent to arrestin R175E) and wild-type P-arrestinl to 
activated or activated and phosphorylated receptor, respectively. Kovoor et al. 
Chimerical arrestins in which functional regions of visual arrestin were swapped 

1 0 with those of p-arrestinl have been reported to be altered in binding affinity to 
activated, phosphorylated GPCRs. Gurevich et al. (1995V Several of these 
chimeras, such as p-arrestinl containing the visual arrestin extreme N-terminus, 
show increased specific binding to phosphorylated activated GPCRs compared to 
wild-type P-arrestinl (Gurevich et al. (1995)). Modifications that enhance arrestin 

1 5 affinity for the activated GPCR such as described above, whether phosphorylated 
or non-phosphorylated, could also enhance signal to noise of P-galactosidase 
activity since the arrestin/GPCR complex is stabilized and/or more long-lived. The 
use of mutant arrestins with higher activated-GPCR affinity would improve the 
inventive technology for GPCR targets, without compromising receptor/ligand 

20 biology. 

In addition, this "super arrestin" approach can be combined with the use of 
arrestin point mutations to provide a stronger signal to noise with or without GRK 
requirements. 
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EXAMPLE 

An arrestin mutant fused to mutant reporter protein can be produced to 
enhance binding of the arrestin to an activated GPCR to enhance sensitivity of 
detection. 
5 Experiment protocol - 

1) In the first step, mutant P-arrestin2 constructions will be generated which 
include R170E/TA7or H, VI 65 A, substitution of a.a. l-43with a.a. 1-47 of visual 
arrestin, or deletion of the C-terminal and combinations of these alterations. The 
mutant P-arrestin2 open reading frames will be cloned in frame with Aco~p- 

10 galactosidase in the pICAST OMC expression vector similar to cloning of the 
R170E p-arrestin2 mutation shown in FIGURE 25. 

2) In the second step, mutant expression constructs will be transduced into a 
C2C12 myoblast cell line that has been engineered to express p2AR as a fusion to 
Aa-P-galactosidase. Following selection with antibiotic drugs, a population of 

1 5 clones expressing both fusion proteins will be obtained. Wild type and R170E 0- 
arrestin2 constructions will be transduced to generate control, reference clonal 
populations. 

3) In the third step, populations of cells expressing both P-arrestin2Aco (mutant 
or wild type) and p2ARAa will be tested for response by agonist/ligand stimulated 

20 P-galactosidase activity. 

4) In the next step, mutant (super) p-aiTestin2Aco constructions that show a 
significantly higher signal to noise ratio in the agonist assay compared with wild- 
type P-arrestin2Aa> will be chosen. These constructions will be used to develop 
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stable cell lines expressing the "super" P-arrestin2Aa> that can be used for 
transducing in known or orphan GPCRs. Use of a super p-arrestin2Ao) could 
increase the signal to noise of ICAST/GPCR applications allowing improved 
screening capabilities for lead and ligand discovery. 
5 Super Arrestin is used to increase the binding efficiency of arrestin to an 

activated GPCR and to stabilize the GPCR/arrestin complex during GPCR 
desensitization. This application can be used to increase the robustness of 
ICAST/GPCR applications in cases where the GPCR is normally resensitized . 
rapidly post desensitization. 

10 The assays of this invention, and their application and preparation have 

been described both generically, and by specific example. The examples are not 
intended as limiting. Other substituent identities, characteristics and assays will 
occur to those of ordinary skill in the art, without the exercise of inventive faculty. 
Such modifications remain within the scope of the invention, unless excluded by 

15 the express recitation of the claims advanced below. 
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WHAT IS CLAIMED IS: 

1 . A method of assessing the effect of a test condition on G-protein- 
coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
5 form of reporter enzyme and an interacting protein partner as a fusion to another 

mutant form of enzyme, 

wherein said cell also expresses an arrestin, wherein said arrestin is 
modified to enhance binding of said arrestin to said GPCR, wherein said enhanced 
binding between said arrestin and said GPCR increases sensitivity of detection of 
10 said effect of said test condition; 

b) exposing the cell to a ligand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
enzyme; 

wherein increased reporter enzyme activity in the cell compared to that 
15 which occurs in the absence of said test condition indicates increased GPCR 

interaction with its interacting protein partner compared to that which occurs in the 
absence of said test condition, and decreased reporter enzyme activity in the cell 
compared to that which occurs in the absence of said test condition indicates 
decreased GPCR interaction with its interacting protein partner compared to that 
20 which occurs in the absence of said test condition. 

2. A method of assessing the effect of a test condition on G-protein- 
coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
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form of reporter enzyme and an interacting protein partner as a fusion to another 
mutant form of enzyme; 

wherein said GPCR fusion protein is modified to include one or more sets 
of serine/threonine clusters, wherein said one or more sets of serine/threonine 
5 clusters enhance binding of said GPCR to arrestin, wherein said enhanced binding 
between said GPCR and said arrestin increases sensitivity of detection of said 
effect of said test condition; 

b) exposing the cell to a ligand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
10 enzyme; 

wherein increased reporter enzyme activity in the cell compared to that 
which occurs in the absence of said test condition indicates increased GPCR 
interaction with said interacting protein partner compared to that which occurs in 
the absence of said test condition, and decreased reporter enzyme activity in the 
1 5 cell compared to that which occurs in the absence of said test condition indicates 
decreased GPCR interaction with interacting protein partner compared to that 
which occurs in the absence of said test condition. 

3. A DNA molecule comprising a sequence encoding a biologically active 
hybrid GPCR, wherein said hybrid GPCR comprises a GPCR as a fusion protein to 

10 one mutant form of reporter enzyme and wherein said hybrid GPCR is modified to 
include one or more sets of serine/threonine clusters, wherein said one or more sets 
of serine/threonine clusters enhance binding of said hybrid GPCR to aixestin. 

4. A DNA construct capable of directing the expression of a biologically 
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active hybrid GPCR in a cell, comprising the following operatively linked 
elements: 

a promoter; and 

a DNA molecule comprising a sequence encoding a biologically active 
5 hybrid GPCR, wherein said hybrid GPCR comprises a GPCR as a fusion protein to 
one mutant form of reporter enzyme and wherein said hybrid GPCR is modified to 
include one or more sets of serine/threonine clusters, wherein said one or more sets 
of serine/threonine clusters enhance binding of said hybrid GPCR to arrestin. 

5. A cell transformed with a DNA construct capable of expressing a 

1 0 biologically active hybrid GPCR in a cell, comprising the following operatively 
linked elements: 

a promoter; and 

a DNA molecule comprising a sequence encoding a biologically active 
hybrid GPCR, wherein said hybrid GPCR comprises a GPCR as a fusion protein to 
15 one mutant form of reporter enzyme and wherein said hybrid GPCR is modified to 
include one or more sets of serine/threonine clusters, wherein said one or more sets 
of serine/threonine clusters enhance binding of said hybrid GPCR to arrestin. 

6. A DNA molecule comprising a sequence encoding a biologically active 
hybrid arrestin, wherein said hybrid arrestin comprises an arrestin as a fusion to 

20 one mutant form of reporter enzyme and wherein said hybrid arrestin is modified 
to enhance binding of said arrestin to GPCR. 

7. A DNA construct capable of directing the expression of a biologically 
active hybrid arrestin in a cell, comprising the following operatively linked 
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elements: 

a promoter, and 

a DNA molecule comprising a sequence encoding a biologically active 
hybrid arrestin, wherein said hybrid arrestin comprises an arrestin as a fusion to 
5 one mutant form of reporter enzyme and wherein said hybrid arrestin is modified 
to enhance binding of said arrestin to GPCR. 

8. A cell transformed with a DNA construct capable of expressing a 
biologically active hybrid arrestin in a cell, comprising the following operatively 
linked elements: 
10 a promoter, and 

a DNA molecule comprising a sequence encoding a biologically active 
hybrid arrestin, wherein said hybrid arrestin comprises an arrestin as a fusion to 
one mutant form of reporter enzyme and wherein said hybrid arrestin is modified 
to enhance binding of said arrestin to GPCR. 
15 9. A method of assessing the effect of a test condition on G-protein- 

coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
form of reporter enzyme and an interacting protein partner as a fusion to another 
mutant form of enzyme, 
20 wherein said cell also expresses an arrestin, wherein said arrestin is 

modified by introducing a point mutation in a phosphorylation-recognition domain 
to remove a requirement for phosphorylation of said GPCR for arrestin binding to 
permit binding of said arrestin to said GPCR in said cell regardless of whether said 

-45- 



WO 01/58923 



PCT/US01/00684 



GPCR is phosphorylated, 

b) exposing the eel] to a ligand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
enzyme; 

5 wherein increased reporter enzyme activity in the cell compared to that 

which occurs in the absence of said test condition indicates increased GPCR 
interaction with its interacting protein partner compared to that which occurs in the 
absence of said test condition, and decreased reporter enzyme activity in the cell 
compared to that which occurs in the absence of said test condition indicates 
1 0 decreased GPCR interaction with its interacting protein partner compared to that 
which occurs in the absence of said test condition. 

1 0. The method of Claim 9, wherein said arrestin is mutated to increase a 
property selected from affinity and avidity for activated, non-phosphorylated 
GPCR. 

15 11. The method of Claim 1 0, wherein said arrestin is p-arrestin2 and 

wherein said P-arrestin2 is mutated to convert Argl69 to an oppositely charged 
residue. 

12. The method of Claim 1 1, wherein said oppositely charged residue is 
selected from the group consisting of histidine, tyrosine, phenylalanine and 

20 threonine. 

13. The method of Claim 9, wherein said arrestin is mutated to increase a 
property selected from affinity and avidity for activated and phosphorylated GPCR. 

14. A method of assessing the effect of a test condition on G-protein- 

-46- 



WO 01/58923 



PCT/US01/00684 



coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
form of reporter enzyme and an interacting protein partner as a fusion to another 
mutant form of enzyme; 

5 wherein said GPCR fusion protein is. modified to include one or more sets 

of serine/threonine clusters, said one or more serine/threonine clusters defined as 
serine or threonine residues occupying three consecutive or three out of four 
positions in a carboxyl-termini of said GPCR, wherein said one or more sets of 
serine/threonine clusters enhance binding of said GPCR to arrestin, wherein said 
1 0 enhanced binding between said GPCR and said arrestin increases sensitivity of 
detection of said effect of said test condition; 

b) exposing the cell to a ligand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
enzyme; 

15 wherein increased reporter enzyme activity in the cell compared to that 

which occurs in the absence of said test condition indicates increased GPCR 
interaction with said interacting protein partner compared to that which occurs in 
the absence of said test condition, and decreased reporter enzyme activity in the 
cell compared to that which occurs in the absence of said test condition indicates 

20 decreased GPCR interaction with interacting protein partner compared to that 
which occurs in the absence of said test condition. 

15. The method of Claim 1, wherein said modified anrestin exhibits 
enhanced binding to activated, phosphorylated GPCR. 
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25. The method of Claim 1 4, wherein said modified arrestin comprises 
conversion of Argl70 to an amino acid selected from the group consisting of 
histidine, tyrosine, phenylalanine and threonine. 
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Agonist Stimulated cAMP Response in C2 Cells Expressing p2AR-(3galA 
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{3-galactosidase Complementation as a Measurement for |32AR-(3galAa 
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p-galactosidase Activity in Response to Agonist in C2 Cells 
Coexpressing p2AR-PgalAa and pArrestinl-pgalAco Fusion Proteins 
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Inhibition of p-galactosidase activity in C2 Cells Coexpressing 
|32AR-pgalAa and pArrestin2-pgalAa> Fusion Proteins 
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Antagonist Inhibition of p-galactosidase Activity in C2 Cells 
Coexpressing p2AR-(3galAa and pArrestinl-pgalAco Fusion Proteins 
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Agonist Stimulated cAMP Response in Clones or Pools of C2 Cells 
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Agonist Stimulated cAMP Response in Clones or Pools of C2 Cells 
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p^R-pgalAco and |3arr2-pgalAa Interaction in HEK293 
Clones in Response to Isoproterenol Treatment (1 jiM) 
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p-galactosidase Complementation as a Measurement for 
Adrenergic Receptor Ho.modimerization in HEK 293 Cells 

Coexpressing p2AR-pgalAa and |32AR-|3galAco. 
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l CTGCAGCCTG AATATGGGCC AAACAGGATA TCTGTGGTAA GCAGTTCCTG 
GACGTCGGAC TTATACCCGG TTTGTCCTAT AGACACCATT CGTCAAGGAC 



51 CCCCGGCTCA GGGCCAAGAA CAGATGGAAC AGCTGAATAT GGGCCAAACA 
GGGGCCGAGT CCCGGTTCTT GTCTACCTTG TCGACTTATA CCCGGTTTGT 



101 GGATATCTGT GGTAAGCAGT TCCTGCCCCG GCTCAGGGCC AAGAACAGAT 
CCTATAGACA CCATTCGTCA AGGACGGGGC CGAGTCCCGG TTCTTGTCTA 



151 GGTCCCCAGA TGCGGTCCAG CCCTCAGCAG TTTCTAGAGA ACCATCAGAT 
CCAGGGGTCT ACGCCAGGTC GGGAGTCGTC AAAGATCTCT TGGTAGTCTA 



201 GTTTCCAGGG TGCCCCAAGG ACCTGAAATG' ACCCTGTGCC TTATTTGAAC 
CAAAGGTCCC ACGGGGTTCC TGGACTTTAC TGGGACACGG AATAAACTTG 



251 TAACCAATCA GTTCGCTTCT CGCTTCTGTT CGCGCGCTTC TGCTCCCCGA 
ATTGGTTAGT CAAGCGAAGA GCGAAGACAA GCGCGCGAAG ACGAGGGGCT 



301 GCTCAATAAA AGAGCCCACA ACCCCTCACT CGGGGCGCCA GTCCTCCGAT 
CGAGTTATTT TCTCGGGTGT TGGGGAGTGA GCCCCGCGGT CAGGAGGCTA 



351 TGACTGAGTC GCCCGGGTAC CCGTGTATCC AATAAACCCT CTTGCAGTTG 
ACTGACTCAG CGGGCCCATG GGCACATAGG TTATTTGGGA GAACGTCAAC 



401 CATCCGACXT GTGGTCTCGC TGTTCCTTGG GAGGGTCTCC TCTGAGTGAT 
GTAGGCTGAA CACCAGAGCG ACAAGGAACC CTCCCAGAGG AGACTCACTA 



451 TGACTACCCG TCAGCGGGGG TCTTTCATTT GGGGGCTCGT CCGGGATCGG 
ACTGATGGGC AGTCGCCCCC AGAAAGTAAA CCCCCGAGCA GGCCCTAGCC 



501 GAGACCCCTG CCCAGGGACC ACCGACCCAC CACCGGGAGG CAAGCTGGCC 
CTCTGGGGAC GGGTCCCTGG TGGCTGGGTG GTGGCCCTCC GTTCGACCGG 



551 AG'CAfiCTTAT CTGTGTCTGirXCGATTGTCT AGrGTCTATG - KCTGATTTTA"" 
TCGTTGAATA GACACAGACA GGCTAACAGA TCACAGATAC TGACTAAAAT 



601 TGCGCCTGCG TCGGTACTAG TTAGCTAACT AGCTCTGTAT CTGGCGGACC 
ACGCGGACGC AGCCATGATC AATCGATTGA TCGAGACATA GACCGCCTGG 



651 CGTGGTGGAA CTGACGAGTT CTGAACACCC GGCCGCAACC CTGGGAGACG 
GCACCACCTT GACTGCTCAA GACTTGTGGG CCGGCGTTGG GACCCTCTGC 



701 TCCCAGGGAC TTTGGGGGCC GTTTTTGTGG CCCGACCTGA GGAAGGGAGT 
AGGGTCCCTG AAACCCCCGG CAAAAACACC GGGCTGGACT CCTTCCCTCA 



751 CGATGTGGAA TCCGACCCCG TCAGGATATG TGGTTCTGGT AGGAGACGAG 
GCTACACCTT AGGCTGGGGC AGTCCTATAC ACCAAGACCA TCCTCTGCTC 



801 AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGAA 
TTGGATTTTG TCAAGGGCGG AGGCAGACTT AAAAACGAAA GCCAAACCTT 



851 CCGAAGCCGC GCGTCTTGTC TGCTGCAGCA TCGTTCTGTG TTGTCTCTGT 
GGCTTCGGCG CGCAGAACAG ACGACGTCGT AGCAAGACAC AACAGAGACA 



901 CTGACTGTGT TTCTGTATTT GTCTGAAAAT TAGGGCCAGA CTGTTACCAC 
GACTGACACA AAGACATAAA CAGACTTTTA ATCCCGGTCT GACAATGGTG 
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9 SI TCCCTTAAGT TTnAOCTTA'J ^TAACTGGAA ACATGTCWJ ajGCL'C^jTC 
AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 



1001 ACAACCAGTC GGTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGTTGGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 



1051 GCAGAATGGC CAACCTTTAA GGTCGGATGG CCGCGAGACG GCACCTTTAA 
CGTCTTACCG GTTGGAAATT GCAGCCTACC GGCGCTCT GC CGTGGAAATT 



1101 CCGAGACCTC ATCACCCAGG TTAAGATCAA GGTCTTTTCA CCTGGCCCGC 
GGCTCTGGAG TAGTGGGTCC AATTCTAGTT CC AG AAAAGT GGACCGGGCG 



11 SI ATGGACACCC AGACCAGGTC CCCTACATCG TGACCTGGGA AGCCTTGGCT 
TACCTGTGGG TCrGGTCCAG GGGATGTAGC ACTGGACCCT TCGGAACCGA 



1201 TTTGACCCCC CTCCCTGGGT CAAGCCCTTT GTACACCGTA AGCCTCCGCC 
AAACTGGGGG GAGGGACCCA GTTCGGGAAA CATGTGGGAT TCGGAGGCGG 



1251 TCCTCTTCCT CCATCCGCCC CGTCTCTCCC CCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTTGGA GGAGCAAGCT 



1301 CCCCGCCTCG ATCCTCCCTT TATCCAGCCC TCACTCCTTC TCTAGGCGCC 
GGGGCGGAGC TAGGAGGGAA ATAGGTCGGG AGTGAGGAAG AGATCCGCGG 



1351 GGCCGCTCTA GCCCATTAAT ACGACTCACT ATAGGGCGAT TCGAATCAGG 
CCGGCGAGAT CGGGTAATTA TGCTGAGTGA TATCCCGCTA AGCTTAGTCC 



1401 CCTTGGCGCG CCGGATCCTT AATTAAGCGC AATTGGGAGG TGGCGGTAGC 
GGAACCGCGC GGCCTAGGAA TTAATTCGCG TTAACCCTCC ACCGCCATCG 

+2 M G V I T D S L A V V A R T D 

1451 CTCGAGArGG GCGTGATTAC GGATTCACTG GCCGTCGTGG CCCGCACCGA 
GAGCTCTACC CGCACTAATG CCTAAGTGAC CGGCAGCACC GGGCGTGGCT 



+ 2 RPS QQLR S L N G EW R F A 



1501 TCGCCCTTCC CAACAGTTAC GCAGCCTGAA TGGCGAATGG CGCTTTGCCT 
AGCGGGAAGG GTTGTCAATG CGTCGGACTT ACCGCTTACC GCGAAACGGA 



+2WFPA PEA V P E S W L E CDL 



1551 GGTTTCCGGC ACCAGAAGCG GTGCCGGAAA GCTGGCTGGA GTGCGATCTT 
CCAAAGGCCG TGGTCTTCGC CACGGCCTTT CGACCGACCT CACGCTAGAA 



+2PEAD T V V VPS NWQM HGY 



1601 CCTGAGGCCG ATACTGTCGT CGTCCCCTCA AACTGGCAGA TGCACGGTTA 
GGACTCCGGC TATGACAGCA GCAGGGGAGT TTGACCGTCT ACGTGCCAAT 



+2 DA P IYTN VTY PIT VNP 



1651 CGATGCGCCC ATCTACACCA ACGTGACCTA TCCCATTACG GTCAATCCGC 
GCTACGCGGG TAGATGTGGT TGCACTGGAT AGGGTAATGC CAGTTAGGCG 
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+2PFVP TEN PTGC YSL TFN 



1701 CGTTTGTTCC CACGGAGAAT CCGACGGGTT GTTACTCGCT CACATTTAAT 
GCAAACAAGG GTGCCTCTTA' GGCTGCCCAA CAATGAGCGA GTGTAAATTA 



+2VDES WLQ EGQ T R I I F D G 



1*751 GTTGATGAAA GCTGGCTACA GGAAGGCCAG ACGCGAATTA TTTTTGATGG 
CAACTACTTT CGACCGATGT CCTTCCGGTC TGCGCTTAAT AAAAACTACC 



+2 VNS A F H L WCN GR W VGY 



1801 CGTTAACTCG GCGTTTCATC TGTGGTGCAA CGGGCGCTGG GTCGGTTACG . 
GCAATTGAGC CGCAAAGTAG ACACCACGTT GCCCGCGACC CAGCCAATGC 



+2GQDS RLP S EFD LSA FLR 



1851 GCCAGGACAG TCGTTTGCCG TCTGAATTTG ACCTGAGCGC ATTTTTACGC 
CGGTCCTGTC AGCAAACGGC AGACTTAAAC TGGACTCGCG TAAAAATGCG 



+2AGEN RLA VMV LRWS DGS 



1901 GCCGGAGAAA ACCGCCTCGC GGTGATGGTG CTGCGCTGGA GTGACGGCAG 
CGGCCTCTTT TGGCGGAGCG CCACTACCAC GACGCGACCT CACTGCCGTC 



+2 YLE DQDM WR M SGI FRD 

1951 TTATCTGGAA GAT£:AGGATA. TGTGGCGGAT GAGCGGCATT TTCCGTGACG- 
AATAGACCTT CTAGTCCTAT- ACACCGCCTA CTCGCCGTAA AAGGCACTGC 



+2 V* S L L H K P T T Q.I S D F . H V A 



2001 TCTCGTTGCT GCATAAACCG ACTACACAAA TCAGCGATTT CCATGTTGCC 
AGAGCAACGA CGTATTTGGC TGATGTGTTT AGTCGCTAAA GGTACAACGG 



+2 T R F N "b D F 5 " R A V Ii E A E V Q 



2051 ACTCGCTTTA ATGATGATTT CAGCCGCGCT GTACTGGAGG CTGAAGTTCA 
TGAGCGAAAT TAQTACTAAA GTCGGCGCGA CATGACGTCC fcACTTCAAGT 



+2 MCG ELRD YL R VTV SLW 



2101 GATGTGCGGC GAGTTGCGTG ACTACCTACG GGTAACAGTT TCTTTATGGC 
CTACACGCCG CTCAACGCAC TGATGGATGC CCATTGTCAA AGAAATACCG 



+2QGET Q V A S G T A PFG GEI 



2151 AGGGTGAAAC GCAGGTCGCC AGCGGCACCG CGCCTTTCGG CGGTGAAATT 
TCCCACTTTG CGTCCAGCGG TCGCCGTGGC GCGGAAAGCC GCCACTTTAA 



+2IDER GGY ADR VTLR LNV 



2201 ATCGATGAGC GTGGTGGTTA TGCCGATCGC GTCACACTAC GTCTGAACGT 
TAGCTACTCG CACCACCAAT ACGGCTAGCG CAGTGTGATG CAGACTTGCA 



+2 ENP KLWS A E I P N I» YRA 



2251 CGAAAACCCG AAACTGTGGA GCGCCGAAAT CCCGAATCTC TATCGTGCGG 
GCTTTTGGGC TTTGACACCT CGCGGCTTTA GGGCTTAGAG ATAGCACGCC 
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+ 2VVEL HTA DGTL I E A E A C 



2301 TGGTTGAACT GCACACCGCC GACGGCACGC TGATTGAAGC AGAAGCCTGC 
ACCAACTTGA CGtGTGGCGG CTGCCGTGCG ACTAACTTCG TCTTCGGACG 



+2DVGF REV R I E N GLL LLN 



2351 GATGTCGGTr TCCGCGAGGT GCGGATTGAA AATGGTCTGC TGCTGCTGAA 
CTACAGCCAA AGGCGCTCCA CGCCTAACTT TTACCAGACG ACGACGACTT 



+2 GKP L LIR GVN RHE HHP 



2401 CGGCAAGCCG TTGCTGATTC GAGGCGTTAA CCGTCACGAG CATCATCCTC 
GCCGTTCGGC AACGACTAAG. CTCCGCAATT GGCAGTGCTC GTAGTAGGAG 



+2LHGQ VMD EQ TM V Q.- D * ILL 



2451 TGCATGGTCA GGTCATGGAT GAGCAGACGA TGGTGCAGGA TATCCTGCTG 
ACGTACCAGT CCAGTACCTA CTCGTCTGCT ACCACGTCCT ATAGGACGAC 



+2MKQ'N NFN A V R CSHY PNH 



2501 ATGAAGCAGA ACAACTTTAA CGCCGTGCGC TGTTCGCATT ATCCGAACCA 
TACTTCGTCT TGTTGAAATT GCGGCACGCG ACAAGCGTAA TAGGCTTGGT 



+2 PLW YTLC DR Y G L Y V V D 



2S51 TCCGCTGTGG TACACGCTGT GCGACCGCTA CGGCCTGTAT GTGGTGGATG 
AGGCG ACACC ATGTGCGACA . CGCTGGCGAT GCCGGACATA CACCACCTAC 



+2 E A N I E T H G M V P M N R L T D 



2601 AAGCCAATAT TGAAACCCAC GGCATGGTGC CAATGAATCG TCTGACCGAT 
TTCGGTTATA ACTTTGGGTG CCGTACCACG GTTACTTAGC AGACTGGCTA 

+2 "iT p ~r ~w l~"p~ a ~m~~s e r V t R M V Q 

2651 GATCCGCGCT GGCTACCGGC GATGAGCGAA CGCGTAACGC GAATGGTGCA 
CTAGGCGCGA CCGATGGCCG CTACTCGCTT GCGCATTGCG CTTACCACGT 

+2 RDR NHPS VII WSL GNE 

2701 GCGCGATCGT AATCACCCGA GTGTGATCAT CTGGTCGCTG GGGAATGAAT 
CGCGCTAGCA TTAGTGGGCT CACACTAGTA GACCAGCGAC CCCTTACTTA 

+2S GHG ANH DALY RWI KSV 

2751 CAGGCCACGG CGCTAATCAC GACGCGCTGT ATCGCTGGAT CAAATCTGTC 
GTCCGGTGCC GCGATTAGTG CTGCGCGACA TAGCGACCTA GTTTAGACAG 

+2DPSR PVQ YEG G G A D TTA 

2801 GATCCTTCCC GCCCGGTGCA GTATGAAGGC GGCGGAGCCG ACACCACGGC 
CTAGGAAGGG CGGGCCACGT CATACTTCCG CCGCCTCGGC TGTGGTGCCG 

+2 TDI ICP M Y A R VDE DQP 



2851 CACCGATATT ATTTGCCCGA TGTACGCGCG CGTGGATGAA GACCAGCCCT 
GTGGCTATAA TAAACGGGCT ACATGCGCGC GCACCTACTT CTGGTCGGGA 
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+ 2FPAV PKW SIKK W L 3 LPG 



2901 TCCCGGCTGT GCCGAAATGG TCCATCAAAA AATGGCTTTC GCTACCTGGA 
AGGGCCGACA CGGCTTTACC AGGTAGTTTT TTACCGAAAG CGATGGACCT 



+2ETRP L I L C E Y A H A M GNS 



2 951 GAGACGCGCC CGCTGATCCT TTGCGAATAC GCCCACGCGA TGGGTAACAG 
CTCTGCGCGG GCGACTAGGA AACGCTTATG CGGGTGCGCT ACCCATTGTC 



+2 LGG FAKY W Q A F R Q YPR 



3001 TCTTGGCGGT TTCGCTAAAT ACTGGCAGGC GTTTCGTCAG TATCCCCGTT 
AGAACCGCCA AAGCGATTTA TGACCGTCCG CAAAGCAGTC ATAGGGGCAA 



+2LQGG FVW DWVD QSL IKY 



3051 TACAGGGCGG CTTCGTCTGG GACTGGGTGG ATCAGTCGCT GATTAAATAT 
ATGTCCCGCC GAAGCAGACC CTGACCCACC TAGTCAGCGA CTAATTTATA 



+2 DENG N P W SAY GGDF GDT 



3101 GATGAAAACG GCAACCCGTG GTCGGCTTAC GGCGGTGATT TTGGCGATAC 
CTACTTTTGC CGTTGGGCAC CAGCCGAATG CCGCCACTAA AACCGCTATG 



+2 PND R Q F C MNG LVF ADR 



3151 GCCGAACGAT CGCCAGTTCT GTATGAACGG TCTGGTCTTT GCCGACCGCA 
CGGCTTGCTA GCGGTCAAGA CATACXT GCC AGACCAGAAA CGGCTGGCGT 



+2TPHP ALT EAKH Q OQ FFQ 

3201 CGCCGCATCC AGCGCTGACG GAAGCAAAAC ACCAGCAGCA GTTTTTCCAG 
GCGGCGTAGG TCGCGACTGC CTTCGTTTTG TGGTCGTCGT CAAAAAGGTC 



+2FRL S GQT IEV TSEY L F R 



3251 TTCCGTTTAT CCGGGCAAAC CATCGAAGTG ACCAGCGAAT ACCTGTTCCG 
AAGGCAAATA GGCCCGTTTG GTAGCTTCAC TGGTCGCTTA TGGACAAGGC 



+2 H SD HE LL HWM V A L DGK 



3301 TCATAGCGAT AACGAGCTCC TGCACTGGAT GGTGGCGCTG GATGGTAAGC 
AGTATCGCTA TTGCTCGAGG ACGTGACCTA CCACCGCGAC CTACCATTCG 



+2PLA S G E V PLDV APQ GKQ 



3351 CGCTGGCAAG CGGTGAAGTG CCTCTGGATG TCGCTCCACA AGGTAAACAG 
GCGACCGTTC GCCACTTCAC GGAGACCTAC AGCGAGGTGT TCCATTTGTC 



+2LIEL PEL PQP ESAG QLW 



3401 TTGATTGAAC TGCCTGAACT ACCGCAGCCG GAGAGCGCCG GGCAACTCTG 
AACTAACTTG ACGGACTTGA TGGCGTCGGC CTCTCGCGGC CCGTTGAGAC 



+2 LTV RVVQ P N A TAW SEA 



34 51 GCTCACAGTA CGCGTAGTGC AACCGAACGC GACCGCATGG TCAGAAGCCG 
CGAGTGTCAT GCGCATCACG TTGGCTrGCG CTGGCGTACC AGTCTTCGGC 
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+ 2GHIS A W Q QWRL A E M LSV 

3501 GGCACATCAG CGCCTGGCAG CAGTGGCGTC TGGCGGAAAA CCTCAGTGTG 
CCGTGTAGTC GCGGACCGTC GTCACCGCAG ACCGCCTTTT GGAGTCACAC 



+ 2TLPA ASK AIP HLT T SEM 

3551 ACGCTCCCCG CCGCGTCCCA CGCCATCCCG CATCTGACCA CCAGCGAAAT 
TGCGAGGGGC GGCGCAGGGT GCGGTAGGGC GTAGACTGGT GGTCGCTTTA 



+2 D F C 1ELG NKR W QF NRQ 

3601 GGATTTTTGC ATCGAGCTGG GTAATAAGCG TTGGCAATTT AACCGCCAGT 
CCTAAAAACG TAGCTCGACC CATTATTCGC AACCGTTAAA TTGGCGGTCA 



+2SGFL SQM W IG D KKQ L LT 



3651 CAGGCTTTCT TTCACAGATG TGGATTGGCG ATAAAAAACA ACTGCTGACG 
GTCCGAAAGA AAGTGTCTAC ACCTAACClGC TATTTTTTGT TGACGACTGC 



+ 2PLRD Q F T RAP LDND IGV- 

3701 CCGCTGCGCG ATCAGTTCAC CCGTGCACCG CTGGATAACG ACATTGGCGT 
GGCGACGCGC TAGTCAAGTG GGCACGTGGC GACCTATTGC TGTAACCGCA 



+2 SE A T R I D PNA WV E RWK 



3751 AAGTGAAGCG ACCCGCATTG ACCCTAACGC CTGGGTCGAA CGCTGGAAGG 
TTCACTTCGC TGGGCGTAAC TGGGATTGCG GACCCAGCTT GCGACCTTCC 



+2AAGH Y QA E A* AL L OC TAD 



3801 CGGCGGGCCA TTACCAGGCC GAAGCAGCGT TGTTGCAGTG CACGGCAG AT 
GCCGCCCGGT AATGGTCCGG CTTCGTCGCA ACAACGTCAC GTGCCGTCT A 



+2TXA D A V L I T T A H A W Q HQ 

3851 ACACTTGCTG ATGCGGTGCT GATTACGACC GCTCACGCGT GGCAGCATCA 
TGTGAACGAC TACGCCACGA CTAATGCTGG CGAGTGCGCA CCGTCGT AGT 



+2 GKT LFIS RKT YRI DGS 

3901 GGGGAAAACC TTATTTATCA GCCGGAAAAC CTACCGGATT GATGGTAGTG 
CCCCTTTTGG AATAAATAGT OGGCCTTTTG GATGGCCTAA CTACCATCAC 



+2 G Q M A ITV DVEV ASD TPH 



3951 GTCAAATGGC GATTACCGTT GATGTTGAAG TGGCGAGCGA TACACCGCAT 
CAGTTTACCG CTAATGGCAA CTACAACTTC ACCGCTCGCT ATGTGGCGTA 



+2 PARI GLN C Q L A Q V A E R V 

4001 CCGGCGCGGA TTGGCCTGAA CTGCCAGCTG GCGCAGGTAG CAGAGCGGGT 
GGCCGCGCCT AACCGGACTT GACGGTCGAC CGCGTCCATC* GTCTCGCCCA 



.+2 NWL GLGP QEN Y PD RLT 

4051 AAACTGGCTC GGATTAGGGC CGCAAGAAAA CTATCCCGAC CGCCTTACTG 
TTTGACCGAG CCTAATCCCG GCGTTCTTTT GATAGGGCTG GCGGAATGAC 
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+2 A A C F DRW DLPL SDH YT*P 



4101 CCGCCTGTTT TGACCGCTGG GATCTGCCAT TGTCAGACAT GTATACCCCG 
GGCGGACAAA ACTGGCGACC CTAGACGGTA ACAGTCTGTA CATATGGGGC 



+2 Y V F P SEN GLR CGT R ELN 



4151 TACGTCTTCC CGAGCGAAAA CGGTCTGCGC TGCGGGACGC GCGAATTGAA 
ATGCAGAAGG GCTCGCTTTT GCCAGACGCG ACGCCCTGCG CGCTTAACTT 



+ 2 YG? HQW R GDF QFN ISR 



4201 TTATGGCCC A C ACCAGTGGC GCGGCG ACTT CCAGTTCAAC ATCAGCCGCT 
AATACCGGGT GTGGTCACCG CGCCGCTGAA GGTCAAGTTG TAGTCGGCGA 



+2 Y S Q Q Q L M E T S H R H L L H A 

4251 ACAGTCAACA GCAACTGATG GAAACCAGCC ATCGCCATCT GCTGCACGCG " 
TGTCAGTTGT CGTTGACTAC CTTTGGTCGG TAGCGGTAGA CGACGTGCGC 



+2E.EGT WLN I DG FHMG IGG 



4 301 GAAGAAGGCA CATGGCTGAA TATCGACGGT TTCCATATGG GGATTGGTGG 
CTTCTTCCGT GTACCGACTT ATAGCTGCCA AAGGTATACC CCTAACCACC 



+2 DDS W S PS V S A EFQ L SA 



4351 -CGACGACTCC- TGGAGCCCGT CAGTATCGGC GGAATTOCAG CTGAGCGCCG 
GCTGCTGAGG ACCTCGGGCA GTCATAGCCG CCTTAAGGTC GACTCGCGGC 



+2 G R Y H Y Q I* V W C Q K R S D Y K 

4401 GTCGCTACCA TTACCAGTTG GTCTGGTGTC AAAAAAGATC TGACTATAAA 
CAGCGATGGT AATGGTCAAC CAGACCACAG TTTTTTCTAG ACTGATATTT 



+2D EDL DHH H HH HR 

4451 GATGAGGACC TCGACCATCA TCATCATCAT CACCGGTAAT AATAGGTAGA 
CTACTCCTGG AGCTGGTAGT AGTAGTAGTA GTGGCCATTA TTATCCATCT 



4501 TAAGTGACTG ATTAGATGCA TTGATCCCTC GACCAATXCC GGTTATTTTC 
ATTCACTGAC TAATCTACGT AACTAGGGAG CTGGTTAAGG" CCAATAAAAG 



4551 CACCATATTG CCGTCTTTTG GCAATGTGAG GGCCCGGAAA CCTGGCCCTG 
GTGGTATAAC GGCAGAAAAC CGTTACACTC CCGGGCCTTT GGACCGGGAC 



4601 TCTTCTTGAC GAGCATTCCT AGGGGTCTTT CCCCTCTCGC CAAAGGAATG 
AGAAGAACTG CTCGTAAGGA TCCCCAGAAA GGGGAGAGCG GTTTCCTTAC 



4651 CAAGGTCTGT TGAATGTCGT GAAGGAAGCA GTTCCTCTGG AAGCTTCTTG 
GTTCCAGACA ACTTACAGCA CTTCCTTCGT CAAGGAGACC TTCGAAGAAC 



4701 AAGACAAACA ACGTCTGTAG CGACCCTTTG CAGGCAGCGG AACCCCCCAC 
TTCTGTTTGT TGCAGACATC GCTGGGAAAC GTCCGTCGCC TTGGGGGGTG 



4151 CTGGCGACAG GTGCCTCTGC GGCCAAAAGC CACGTGTATA AGATACACCT 
GACCGCTGTC CACGGAGACG CCGGTTTTCG GTGCACATAT TCTATGTGGA 
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4 801 GCAAAGGCGG CACAACCCCA GTGCCACGTT GTGAGTTGGA TAGTTGTGGA 
CGTTTCCGCC GTGTTGGGGT CACGGTGCAA CACTCAACCT ATCAACACCT 



4 851 AAGAGTCAAA TGGCTCTCCT CAAGCGTATT CAACAAGGGG CTGAAGGATG 
TTCTCAGTTT ACCGAGAGGA CTTCCCATAA GTTGTTCCCC "GACTTCCTAC 



4 901 CCCAGAAGGT ACCCCATTGT ATGGGA7CTG ATCTGGGGCC TCGGTGCACA 
GGGTCTTCCA TGGGGTAACA TACCGTAGAC TAGACCCCGG AGCCACGTGT 



4 951 TGCTTTACAT GTGTTTAGTC GAGGTTAAAA AACGTCTAGG CCCCCCGAAC 
ACGAAATGTA CACAAATCAG CTCCAATTTT TTGCAGATCC GGGGGGCTTG 



5001 CACGGGGACG TGGTTTTCCT TTGAAAAACA CGATGATAAT ACCATGATTG 
GTGCCCCTGC ACCAAAAGGA AACTTTTTGT GCTACTATTA TGGTACTAAC 



5051 AACAAGATGG ATTGCACGCA GGTTCTCCGG CCGCTTGQGT GGAGAGGCTA 
TTGTTCTACC TAACGTGCGT CCAAGAGGCC GGCGAACCCA CCTCTCCGAT 



5101 TTCGGCTATG ACTGGGCACA ACAGACAATC GGCTGCTCTG ATGCCGCCGT 
AAGCCGATAC TGACCCGTGT TGTCTGTTAG CCGACGAGAC TACGGCGGCA 



5151 GTTCCGGCTG TCAGCGCAGG GGCGCCCGGT TCTTTTTGTC AAGACCGACC 
CAAGGCCGAC AGTCGCGTCC CCGCGGGCCA AGAAAAACAG rTCTGGCTGG 



5201 TGTCCGGTGC CCTGAATGAA CTGCAGGACG1 AGGCAGOGCG GCTATCGTGG 
ACAGGCCACG GGACTTACTT GACGTCCTGC TCCGTCGCGC CGATAGCACC 



5251 CTGGCCACGA CGGGOGTTCC TTGCGCAGCT GTGCTCGACG TTGTCACTGA 
GACCGGTGCT GCCCGCAAGG AACGCGTCGA CACGAGCTGC AACAGTGACT 



5301 AGCGGGAAGG GACTGGCTGC TATTGGGCGA AGTGCCGGGG CAGGATCTCC 
TCGCCCTTCC CTGACCGACG ATAACCCGCT TCACGGCCCC GTCCTAGAGG 



5351 TGTCA7CTCA CCTTGCTCCt GCCGAGAAAG TATCCATCAT GGCTGATGCA 
ACAGTAGAGT GGAACGAGGA CGGCTCTTTC AT AG GT ACTA CCGACTACGT 



5401 ATGCGGCGGC TGCATACGCT TGATCCGGCT ACCTGCCCAT TCGACCACCA 
TACGCCGCCG ACGTATGCGA ACTAGGCCGA TGGACGGGTA AGCTGGTGGT 



5451 AGCGAAACAT CGCATCGAGC GAGCACGTAC TCGGATGGAA GCCGGTCTTG 
TCGCTTTGTA GCGTAGCTCG CTCGTGCATG AGCCTACC7T CGGCCAGAAC 



5501 TCGATCAGGA TGATCTGGAC GAAGAGCATC AGGGGCTCGC GCCAGCCGAA 
AGCTAGTCCT ACTAGACCTG CTTCTCGTAG TCCCCGAGCG CGGTCGGCTT 



5551 CTGTTCGCCA GGCTCAAGGC GCGCATGCCC GACGGCGAGG ATCTCGTCGT 
GACAAGCGGT CCGAGTTCCG CGCGTACGGG CTGCCGCTCC TAGAGCAGCA 



5601 GACCCATGGC GATGCCTGCT TGCCGAATAT CATGGTGGAA AATGGCCGCT 
CTGGGTACCG CTACGGACGA ACGGCTTATA GTACCACC7T TTACCGGCGA 



5651 TTTCTGGArr CATCGACTGT GGCCGGCTGG GTGTGGCGGA CCGCTATCAG 
AAAGACCTAA GTAGCTGACA CCGGCCGACC . CACACCGCCT GGCGATAGTC 



5701 GACATAGCGT TGGCTACCCG TGATATTGCT GAAGAGCTTG GCGGCGAATG 
CTGTATCGCA ACCGATGGGC ACTATAACGA CTTCTCGAAC CGCCGCTTAC 
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5151 GGCTGACCGC TTCCTCGTGC TTTACGGTAT CGCCGCTCCC GATTCGCAGC 
. CCGACTGGCG AAGGAGCACG AAATGCCATA GCGGCGAGGG CTAAGCGTCG 



5801 GCATCGCCTT CTATCGCCTT CTTGACGAGT TCTTCTGAGC GGGACTCTGG 
CGTAGCGGAA GATAGCGGAA GAACTGCTCA AGAAGACTCG CCCTGAGACC 



5851 GGTTCGCATC GATAAAATAA AAGATTTTAT TTAGTCTCCA GAAAAAGGGG 
CCAAGCGTAG CTATTTTATT TTCTAAAATA AATCAGAGGT CTTTTTCCCC 



5901 GGAATGAAAG ACGCCACCTG TAGGTTTGGC AAGCTAGCTT AAGTAACGCC 
CCTTACTTTC TGGGGTGGAC ATCCAAACCG TTCGATCGAA TTCATTGCGG 



5951 ATTTTGCAAG GCATGGAAAA ATACATAACT GAGAATAGAG AAGTTCAGAT 
TAAAACGTTC CGTACCTTTT TATGTATTGA CTCTTATCTC TTCAAGTCTA 



6001 CAAGGTCAGG AACAGATGGA ACAGCTGAAT ATGGGCCAAA* CAGGATATCT 
GTTCCAGTCC TTGTCTACCT TGTCGACTTA TACCCGGTTT GTCCTATAGA 



6051 GTGGTAAGCA GTTCCTGCCC CGGCTCAGGG CCAAGAACAG ATGGAACAGC 
CACCATTCGT CAAGGACGGG GCCGAGTCCC GGTTCTTGTC TACCTTGTCG 



6101 TGAATATGGG CCAAACAGGA TATCTGTGGT AAGCAGTTCC TGCCCCGGCT: 
ACTTATACCC GGTTTGTCCT ATAGACACCA TTCGTCAAGG ACGGGGCCGA 



6151 CAGGGCCAAG AACAGATGGT CCCCAGATGC GGTCCAGCCC TCAGCAGTTT 
GTCCCGGTTC TTGTCTACCA GGGGTCTAGG CCAGGTCGGG AGTCGTCAAA 



6201 CTAGAGAACC ATCAGATGTT TCCAGGGTGC CCCAAGGACC TGAAATGACC 
GATCTCTTGG TAGTCTACAA AGGTCCCACG GGGTTCCTGG ACTTTACTGG 



6251 CTGTGCCTTA TTTGAACTAA CCAATCAGTT CGCTTCTCGC TTCTGTTCGG 
GACACGGAAT AAACTTGATT GGTTAGTCAA GCGAAGAGCG AAGACAAGCG 



6301 GCGCTTCTGC TCCCCGAGCT CAATAAAAGA GCCCACAACC CCTCACTCGG 
CGCGAAGACG AGGGGCTCGA GTTATTTTCT CGGGTGTTGG GGAGTGAGCC 



6351 GGCGCCAGTC CTCCGATTGA CTGAGTCGCC CGGGTACCCG TGTATCCAAT 
CCGCGGTCAG GAGGCTAACT GACTCAGCGG GCCCATGGGC ACATAGGTTA 



*401 AAACCCTCTT GCAGTTGCAT CCGACTTGTG GTCTCGCTGT TCCTTGGGAG 
TTTGGGAGAA CGTCAACGTA GGCTGAACAC CAGAGCGACA AGGAACCCTC 



64 51 GGTCTCCTCT GAGTGATTGA CTACCCGTCA GCGGGGGTCT TTCATTCATG 
CCAGAGGAGA CTCACTAACT GATGGGCAGT CGCCCCCAGA AAGTAAGTAC 



6501 CAGCATGTAT CAAAATTAAT TTGGTTTTTT TTCTTAAGTA TTTACATTAA 
GTCGTACATA GTTTTAATTA AACCAAAAAA AAGAATTCAT AAATGTAATT 



6551 ATGGCCATAG TTGCATTAAT GAATCGGCCA ACGCGCGGQG AGAGGCGGTT 
TACCGGTATC AACGTAATTA CTTAGCCGGT TGCGCGCCCC TCTCCGCCAA 



6601 TGCGTATTGG CGCTCTTCCG CTTCCTCGCT CACTGACTCG CTGCGCTCGG 
ACGCATAACC GCGAGAAGGC GAAGGAGCGA GTGACTGAGC GACGCGAGCC 



6651 TCGTTCGGCT GCGGCGAGGG GTATCAGCTC ACTCAAAGGC GGTAATACGG 
AGCAAGCCGA CGCCGCTCGC CATAGTCGAG TGAGTTTCCG CCATTATGCC 
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1 CTGCAGCCTG AATATGGGCC AAACAGGATA TCTGTGGTAA GCAGTTCCTO 
GACGTCGGAC TTATACCCGG TTTGTCCTAT AGACACCATT CGTCAAGGAC 



51 CCCCGGCTCA GGGCCAAGAA CAGATGGAAC AGCTGAATAT GGGCCAAACA 
GGGGCCGAGT CCCGGTTCTT GTCTACCTTG TCGACTTATA CCCGGTTTGT 

101 GGATATCTGT GGTAAGCAGT TCCTGCCCCG GCTCAGGGCC AAGAACAGAT 
CCTATAGACA CCATTCGTGA AGGACGGGGC CGAGTCCCGG TTCTTGTCTA 



151 GGTCCCCAGA TGCGGTCCAG CCCTCAGCAG TTTCTAGAGA ACCATCAGAT 
CCAGGGGTCT ACGCCAGQTC GGGAGTCGTC AAAGATCTCT TGGTAGTCTA 



201 GTTTCCAGGG TGCCCCAAGG ACCTGAAATG ACCCTGTGCC TTATTTGAAC 
CAAAGGTCCC ACGGGGTTCC TGGACTTTAC TGGGACACGG AATAAACTTG 



251 TAACCAATCA GTTCGCTTCT CGCTTCTGTT CGCGCGCTTC TGCTCCCCGA 
ATTGGTTAGT CAAGCGAAGA GCGAAGACAA GCGCGCGAAG ACGAGGGGCT 



301 GCTCAATAAA AGAGCCCACA ACCCCTCACT CGGGGCGCCA GTCCTCCGAT 
CGAGTTATTT TCTCGGGTGT TGGGGAGTGA GCCCCGCGGT CAGGAGGCTA 



351 TGACTGAGTC GCCCGGGTAC CCGTGTATCC AATAAACCCT CTTGCAGTTG 
ACTGACTCAG CGGGCCCATG GGCACATAGG TTATTTGGGA GAACGTCAAC 



401 CATCCGACTT GTGGTCTCGC TGTTCCTTGG GAGGGTCTCC TCTGAGTGAT 
GTAGGCTGAA CACCAGAGCG ACAAGGAACC CTCCCAGAGG AGACTCACTA 



451 TGACTACCCG TCAGCGGGGG TCTTTCATTT GGGGGCTCGT CCGGGATCGG 
ACTGATGGGC AGTCGCCCCC AGAAAGTAAA CCCCCGAGCA GGCCCTAGCC 



501 GAGACCCCTG CCCAGGGACC ACCGACCCAC CACCGGGAGG CAAGCTGGCC 
CTCTGGGGAC GGGTCCCTGG TGGCTGGGTG GTGGCCCTCC GTTCGACCGG 



-551 AGCAACTTAT CTGTGTCTCT CCSATTGTCT AGTGTCTATG ACTGATTTTA 
TCGTTGAATA GACACAGACA GGCTAACAGA TCACAGATAC TGACTAAAAT 



601 TGCGCCTGCG TCGGTACTAG TTAGCTAACT AGCTCTGTAT CTGGCGGACC 
ACGCGGACGC AGCCATGATC AATCGATTGA TCGAGACATA GACCGCCTGG 



651 CGTGGTGGAA CTGACGAGTT CTGAACACCC GGCCGCAACC CTGGGAGACG 
GCACCACCTT GACTGCTCAA GACTTGTGGG CCGGCGTTGG GACCCTCTGC 



701 TCCCAGGGAC TTTGGGGGCC GTTTTTGTGG CCCGACCTGA GGAAGGGAGT 
AGGGTCCCTG AAACCCCCGG CAAAAACACC GGGCTGGACT CCTTCCCTCA 



751 CGATGTGGAA TCCGACCCCG TCAGGATATG TGGTTCTGGT AGGAGACGAG 
GCTACACCTT AGGCTGGGGC AGTCCTATAC ACCAAGACCA TCCTCTGCTC 



801 AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGAA 
TTGGATTTTG TCAAGGGCGG AGGCAGACTT AAAAACGAAA GCCAAACCTT 



851 CCGAAGCCGC GCGTCTTGTC TGCTGCAGCA TCGTTCTGTG TTGTCTCTGT 
GGCTTCGGCG CGCAGAACAG ACGACGTCGT AGCAAGACAC AACAGAGACA 



901 CTGACTGTGT TTCTGTATTT GTCTGAAAAT TAGGGCCAGA CTGTTACCAC 
GACTGACACA AAGACATAAA CAGACTTTTA ATCCCGGTCT GACAATGGTG 
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951 


TCCCTTAAGT TTGACCTTAG GTAACTGGAA AGATGTCGAG CGGCTCGCTC 
AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 








1001 


ACAACCAGTC GGTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGTTGGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 






• 


1051 


GCAGAATGGC CAACCTTTAA CGTCGGATGG CCGCGAGACG GCACCTTTAA 
CGTCTTACCG GTTGGAAATT GCAGCCTACC GGCGCTCTGC CGTGGAAATT 








1X01 


CCGAGACCTC ATCACCCAGG T7AAGATCAA GGTCTTTTCA CCTGGCCCGC 
GGCTCTGGAG TAGTGGGTCC AATTCTAGTT CCAGAAAAGT GGACCGGGCG 








nsr 


ATGGACACCC AGACCAGGTC CCCTACATCG TGACCTGGGA AGCCTTGGCT 
TACCTGTGGG TCTGGTCCAG GGGATGTAGC ACTGGACCCT TCGGAACCGA 








1201 


TTTGACCCCC CTCCCTGGGT CAAGCCCTTT GTACACCCTA AGCCTCCGCC 
AAACTGGGGG GAGGGACCCA GTTCGGGAAA CATGTGGGAT TCGGAGGCGG 








1251 


TCCTCTTCCT CCATCCGCCC CGTCTCTCCC CCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTTGGA GGAGCAAGCT 








1301 


CCCCGCCrCG ATCCTCCCTT TATCCAGCCC TCACTCCrTC TCTAGGCGCC 
GGGGCGGAGC TAGGAGGGAA A7AGGTCGGG AGTGAGGAAG, AGATCCGCGG 








1351 


GGCCGCTCTA GCCCATTAAT. ACGACTCACT ATAGGGCGAT TCGAACACCA 
CCGGCGAGAT CGGGTAATTA TGCTGAGTGA TATCCCGCTA AGCTTGTGGT 


















1401 


TGCACCATCA T CAT CATC AC GTCGACTATA AAGATGAGGA CCTCGAGATG 
ACGTGGTAGT AGTAGTAGTG CAGCTGATAT TTCTACTCCT> GGAGCTCTAC 








1451 


GGCGTGArTA CGGATTCACT GGCCGTCGTG GCCCGCACCG ATCGCCCTTC 
CCGCACTAAT GCCTAAGTGA CCGGCAGCAC CGGGCGTGGC TAGCGGGAAG 








1501 


CCAACAGTrA CGCAGCCTGA ATGGCGAATG GCGCTTTGCC TGGTTTCCGG 
GGTTGTCAAT GCGTCGGACT TACCGCTTAC CGCGAAACGG ACCAAAGGCC 








1551 


CACCAGAAGC GGTGCCGGAA AGCTGGCTGG AGTGCGArCT TCCTGAGGCC 
GTGGTCTTCG CCACGGCCTT 1CGACCGACC TCACGCTAGA AGGACTCCGG 








1 <A1 
1 oUl 


bAlAUXblLAj luoic^^iw. rtflAv*.WjWib A £ uGAwSGTX ACGATGCGCC 
CTATGACAGC AGCAGGGGAG TTTGACCGTC TACGTGCCAA TGCTACGCGG 








1651 


CATCTACACC AACGTGACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC 
GTAGATGTGG TTGCACTGGA TAGGGTAATG CCAGTTAGGC GGCAAACAAG 








1701 


CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA 
GGTGCCrcrT AGGCTCCCCA ACAATGAGCG ACTGTAAATT ACAACTACTT 








1751 


AGCTGGCTAC AGGAAGGCCA GACGCGAATT ATTTTTGATG GCGTTAACTC 
TCGACCGATG TCCTTCCGGT CTGCGCTTAA TAAAAACTAC CGCAATTGAG 








1801 


GGCGTTTCAT CTGTGGTGCA ACGGGCGCTG GGTCGGTTAC GGCCAGGACA 
CCGCAAAGTA GACACCACGT TGCCCGCGAC CCAGCCAATG CCGGTCCTGT 








1851 


GTCGTTTGCC GTCTGAATTT GACCTGAGCG CATTTTTACG CGCCGGAGAA 
CAGCAAACGG CAGACTTAAA CTGGACTCGC GTAAAAATGC GCGGCCTCTT 
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1901 AACCGCCTCG CGGTGATGGT GCTnOGCTGG AGTGACGGCA GTTATCTGGA 
TTGGCGGAGC GCCACTACCA CGAO ;rGACC TCACTGCCGT CAATAGACCT 



1951 AGATCAGGAT ATGTGGCGGA TGAGCGGCAT TTTCCGTGAC GTCTCGTTGC 
TCTAGTCCTA TACACCGCC? ACTCGCCGTA AAAGGCACTG CAGAGCAACG 



2001 TGCATAAACC GACTACACAA ATCAGCGATT TCCATGTTGC CACTCGCTTT 
ACGTATTTGG CTGATGTGTT TAGTCGCTAA AGGTACAACG GTGAGCGAAA 



20S1 AATGATGATT TCAGCCGCGC TGTACTGGAG GCTGAAGTTG AGATGTGCGG 
T TACT ACT AA AGTCGGCGCG ACATGACCTC CGACTTCAAG T£TACACGCC 



2101 CGAGTTGCGT GACTACCTAC GGGTAACAGT TTCTTTATGG CAGGGTGAAA 
GCTCAACGCA CTGATGGATG CCCATTGTCA AAGAAATACC GTCCCACTTT 



2151 CGCAGGTCGC CAGCGGCACC GCGCCTTTCG GCGGTGAAAT TATCGATGAG 
GCGTCCAGCG GTCGCCGTGG CGCG(3AAAGC CGCCACTTTA ATAGCTACTC 



2201 * CGTGGTGGTT ATGCCGATCG CGTCACACTA CGTCTGAACG TCGAAAACCC 
GCACCACCAA TACGGCTAGC GCAGTGTGAT GCAGACTTGC AGCTTTTGGG 



2251 GAAACTGTGG AGCGCCGAAA TCCCGAATCT CTATCGTGCG GTGGTTGAAC 
• CTTTGACACC TCGCGGCTTT AGGGCTTAGA GATAGCACGC CACCAACTTG 



2301 TGCACACCGC CGACGGCACG CTGATTGAAG CAGAAGCCTG CGATGfCGGT 
ACGTGTGGCG GCTGCCGTGC GACTAACTTC GTCTTCGGAC GCTACAGCCA 



2351 TTCCGCGAGG TGCGGATTGA AAATGGTCTG CTGCTGCTGA ACGGCAAGCC 
AAGGCGCTCC ACGCCTAACT TTTACCAGAC GACGACGACT TGCCGTTCGG 



2401 GTTGCTGATT CGAGGCGTTA ACCGTCACGA GCATCATCCT CTGCATGGTC 
CAACGACTAA GCTCCGCAAT TGGCAGTGCT CGTAGTAGGA GACGTACCAG 



-2451-AGGTCATGGA-TGAGCAGACG ATGGTGCAGG ATATCCTGCT GATGAAGCAG 
TCCAGTACCT ACTCGTCTGC TACCACGTCC TATAGGACGA CTACTTCGTC 



2501 AACAACTTTA ACGCCGTGCG CTGTTCGCAT TATCCGAACC ATCCGCTGTG 
TTGTTGAAAT TGCGGCACGC GACAAGCGTA ATAGGCTTGG TAGGCGACAC 



2551 GTACACGCTG TGCGACCGCT ACGGCCTGTA TGTGGTGGAT GAAGCCAATA 
CATGTGCGAC ACGCTGGCGA TGCCGGACAT ACACCACCTA CTTCGGTTAT 



2 601 TTGAAACCCA CGGCATGGTG CCAATGAATC GTCTGACCGA TGATCCGCGC 
AACTTTGGGT GCCGTACCAC GGTTACTTAG CAGACTGGCT ACTAGGCGCG 



2 651 TGGCTACCGG CGATGAGCGA ACGCGTAACG CGAATGGTGC AGCGCGATCG 
ACCGATGGCC GCTACTCGCT TGCGCATTGC GCTTACCACG TCGCGCTAGC 



2701 TAATCACCCG AGTGTGATCA TCTGGTCGCT GGGGAATGAA TCAGGCCACG 
ATTAGTGGGC TCACACTAGT AGACCAGCGA CCCCTTACTT AGTCCGGTGC 



2751 GCGCTAATCA CGACGCGCTG TATCGCTGGA TCAAATCTGT CGATCCTTCC 
CGCGATTAGT GCTGCGCGAC ATAGCGACCT AGTTTAGACA GCTAGGAAGG 



2801 CGCCCGGTGC AGTATGAAGG CGGCGGAGCC GACACCACGG CCACCGATAT 
GCGGGCCACG TCATACTTCC GCCGCCTCGG CTGTGGTGCC GGTGGCTATA 
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2351 


TATTTGCCCG ATGTACGCGC GCGTGGATGA AGACCAGCCC TTCCCGGCTG 
ATAAACGGGC TACATGCGCG CGCACCTACT TCTGGTCGGG AAGGGCCGAC 




2901 


TGCCGAAATG GTCCATCAAA AAATGGCTri CGCTACCTGG AGAGACGCGC 
ACGGCTTTAC CAGGTAGTTT TTTACCGAAA GCGATGGACC TCTCTGCGCG 


* 


295X 


CCGCTGATCC TTTGCGAATA CGCCCACGCG ATGGGTAACA GTCTTGGCGG 
GGCGACTRGG AAACGCTTAT GCGGGTGCGC TACCCATTGT CAGAACCGCC 




3001 


TTTCGCTAAA TACTGGCAGG CGTTTCGTCA GTATCCCCGT TTACAGGGCG 
AAAGCGATTT ATGACCGTCC GCAAAGCAGT CATAGGGGCA AATGTCCCGC 




3051 


GCTTCGTCTG GGACTGGGTG GATCAGTCGC TGATTAAATA TGATGAAAAC - • 
CGAAGCAGAC CCTGACCCAC CTAGTCAGCG ACTAATTTAT ACTACTTTTG 




3101 


GGCAACCCGT GGTCGGCTTA CGGCGGTGAT TTTGGCGATA CGCCGAACGA 
CCGTTGGGCA CCAGCCGAAT GCCGCCACTA AAACCGCTAT GCGGCTTGCT 




3151 


TCGCCAGTTC TGTATGAACG GTCTGGTCTT TGCCGACCGC ACGCCGCATC 
AGCGGTCAAG ACATACTTGC CAGACCAGAA ACGGCTGGCG TGCGGCGTAG 




3201 


CAGCGCTGAC GGAAGCAAAA CACCAGCAGC AGTTTTTCCA GTTCCGTTTA 
GTCGCGACTG CCTTCGTTTT GTGGTCGTCG TCAAAAAGGT CAAGGCAAAT 




3251 


TCCGGGCAAA CCATCGAAGT GACCAGCGAA TACCTGTTCC GTCATAGCGA 
AGGCCCGTTT GGTAGCTTCA CTGGTCGCTT ATGGACAAGG CAGTATCGCT 




3301 


TAACGAGCTC CTGCACTGGA TGGTGGCGCT GGATGGTAAG CCGCTGGCAA 
ATTGCTCGAG GACGTGACCT ACCACCGCGA CCTACCATTC GGCGACCGTT 




3351 


GCGGTGAAGT GCCTCTGGAT GTCGCTCCAC AAGGTAAACA GTTGATTGAA 

CGCCACTTCA: cggagAccta cagcgaggtg ttccatttgt caactaactt 




34 01 


CTGCCTGAAC TACCGCAGCC GGAGAGCGCC GGGCAACTCT GGCTCACAGT 
GACGGACTTG ATGGCGTCGG CCTCTCGCGG CCCGTTGAGA CCGAGTGTCA 


■ , 


3451 


ACGCGTAGTG CAACCGAACG CGACCGCATG GTCAGAAGCC GGGCACATCA 
TGCGCATCAC GTTGGCTTGC GCTGGCGTAC CAGTCTTCGG CCCGTGTAGT 




3501 


GCGCCTGGCA GCAGTGGCGT CTGGCGGAAA ACCTCAGTGT GACGCTCCCC 
CGCGGACCGT CGTCACCGCA GACCGCCTTT TGGAGTCACA CTGCGAGGGG 




3551 


GCCGCGTCCC ACGCCATCCC GCATCTGACC ACCAGCGAAA TGGATTTTTG 
CGGCGCAGGG TGCGGTAGGG CGTAGACTGG TGGTCGCTTT ACCTAAAAAC 




3601 


CATCGAGCTG GGTAATAAGC GTTGGCAATT TAACCGCCAG TCAGGCTTTC 
GTAGCTCGAC CCATTATTCG CAACCGTTAA ATTGGCGGTC AGTCCGAAAG 




3651 


TTTCACAGAT GTGGATTGGC GATAAAAAAC AACTGCTGAC GCCGCTGCGC 
AAAGTGTCTA CACCTAACCG CTATTTTTTG TTGACGACTG CGGCGACGCG 




3701 


GATCAGTTCA CCCGTGCACC GCTGGATAAC GACATTGGCG TAAGTGAAGC 
CTAGTCAAGT GGGCACGTGG CGACCTATTG CTGTAACCGC ATTCACTTCG 




3751 


GACCCGCATT GACCCTAACG CCTGGGTCGA ACGCTGGAAG GCGGCGGGCC 
CTGGGCGTAA CTGGGATTGC GGACCCAGCT TGCGACCTTC CGCCGCCCGG 
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3801 ATTACCAGGC CGAAGCAGCG TTGTTGCAGT* GCACGGCAGA TACACTTGCT 
TAATGGTCCG GCTTCGTCGC AACAACGTCA CGTGCCGTCT ATGTGAACGA 



3851 GATGCGGTGC TGATTACGAC CGCTCACGCG TGGCAGCA7C AGGGGAAAAC 
CTACGCCACG ACTAATGCTG GCGAGTGCGC ACCGTCGTAG TCCCCTTTTG 



3901 CTTATTTATC AGCCGGAAAA CCTACCGGAT TGATGGTAGT GGTCAAATGG 
GAATAAATAG TCGGCCTTTT GGATGGCCTA ACTACCATCA CCAGTTTACC 



3951 CGATTACCGT TGATGTTGAA GTGGCGAGCG ATACACCGCA TCCGGCGCGG 
GCTAATGGCA ACTACAACTT CACCGCTCGC TATGTGGCGT AGGCCGCGCC 



4001 ATTGGCCTGA ACTGCCAGCT GGCGCAGGTA GCAGAGCGGG TAAACTGGCT 
* TAACCGGACT TGACGGTCGA CCGCGTCCAT CGTCTCGCCG ATTTGACCGA 



4051 CGGATTAGGG CCGCAAGAAA ACTATCCCGA CCGCCTTACT GCCGCCTGTT 
GCCTAATCCC GGCGTTCTTT TGATAGGGCT GGCGGAATGA CGGCGGACAA. 



4101 TTGACCGCTG- GGATCTGCCA TTGTCAGACA TGTATACCCC GTACGTCTTC 
AACTGGCGAC CCTAGACGGT AACAGTCTGT ACATATGGGG CATGCAGAAG 



4151 CCGAGCGAAA ACGGTCTGCG CTGCGGGACG CGCGAATTGA ATTATGGCCC 
GGCTCGCTTT TGCCAGACGC GACGCCCTGC GCGCTTAACT TAATACCGGG 



4201 AGACCAGTGG CGCGGCGACT TCCAGTTCAA CATCAGCCGC: TACAGTCAAC 
TGTGGTCACC GCGCCGCTGA AGGTCAAGTT GTAGTCGGCG ATGTCAGTTG 



4251 AGCAACTGAT GGAAACCAGC CATCGCCATC TGCTGCACGC GGAAGAAGGC 
TCGTTGACTA CCTTTGGTCG G7AGCGGTAG ACGACGTGCG CCTTCTTCCG 



4301 ACATGGCTGA ATATCGACGG TTTCCATATG GGGATTGGTG GCGACGACTC 
TGTACCGACT TATAGCTGCC AAAGGTATAC CCCTAACCAC CGCTGCTGAG 



4351 -GTGGAGCCCG— TCAGTATCGG .- CGGAATTCCA GCTGAGCGCC GGTCGCTACC 
GACCTCGGGC AGTCATAGCG GCCTTAAGGT CGACTCGCGG CCAGCGATGG. 



4 401 ATTACCAGTT GGTCTGGTGT CAAAAAAGAT CTGGAGGTGG TGGCAGCAGG 
TAATGGTCAA CCAGACCACA GTTTTTTCTA GACCTCCACC ACCGTCGTCC 



4 451 CCTTGGCGCG CCGGATCCTT AATTAACAAT TGACCGGTAA TAATAGGTAG 
GGAACCGCGC GGCCTAGGAA TTAATTGTTA ACTGGCCATT ATTATCCATC 



4501 ATAAGTGACr GATTAGATGC ATTGATCCCT CGACCAATTC CGGTTATTTT 
TATTCACTGA CTAATCTACG TAACTAGGGA GCTGGTrAAG GCCAATAAAA 



4551 CCACCATATT GCCGTCTTTT GGCAATGTGA GGGCCCGGAA ACCTGGCCCT 
GGTGGTATAA CGGCAGAAAA CCGTTACACT CCCGGGCCTT TGGACCGGGA 



4601 GTCTTCTTGA CGAGCATTCC TAGGGGTCTT TCCCCTCTCG CCAAAGGAAT 
CAGAAGAACT GCTCGTAAGG ATCCCCAGAA AGGGGAGAGC GGTTTCCTTA 



4651 GCAAGGTCTG TTGAATGTCG TGAAGGAAGC AGrTCCTCTG GAAGCTTCTT 
CGTTCCAGAC AACTTACAGC ACTTCCTTCG TCAAGGAGAC CTTCGAAGAA 



4701 GAAGACAAAC AACGTCTGTA GCGACCCTTT GCAGGCAGCG GAACCCCCCA 
CTTCTGTTTG TTGCAGACAT CGCTGGGAAA CGTCCGTCGC CTTGGGGGGT 
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4751 CCTGGCGACA GGTGCCTCTG CGGCCAAAAG CCACGTGTAT AAGATACACC 
GGACCGCTGT CCACGGAGAC GCCGGTTTTC GGTGCACATA TTCTATGTGG 



4801 TGCAAAGGCG GCACAACCCC AGTGCCACGT TGTGAGTTGG ATAGTTGTGG 
ACGTTTCCGC CGTGTTGGGG- TCACGGTGCA ACACTCAACC TATCAACACC 



4851 AAAGAGTCAA ATGGCTCTCC TCAAGCGTAT TCAACAAGGG GCTGAAGGAT 
TTTCTCAGTT TACCGAGAGG AGTTCGCATA AG.TTGTTCCC CGACTTCCTA 



4 901 GCCCAGAAGG TACCCCATTG. TATGGGATCT GATCTGGGGC CTCGGTGCAC 
CGGGTCTTCC ATGGGGTAAC ATACCCTAGA CTAGACCCCG GAGCCAQGTG 



4951 ATGCTTTACA TGTGTTTAGT CGAGGTTAAA AAACGTCTAG GCCCCCCGAA 
TACGAAATGT ACACAAATCA GCTCCAATTT TTTGCAGATC CGGGGGGCTT 



5001 CCACGGGGAC GTGGTTTTCC TTTGAAAAAC ACGATGATAA TACCATGATT 
GGTGCCCCTG CACCAAAAGG AAACTTTTTG TGCTACTATT ATGGTACTAA 



5051 GAACAAGATG GATTGCACGC AGGTTCTCCG GCCGCTTGGG TGGAGAGGCT 
CTTGTTCTAC CTAACGTGCG TCCAAGAGGC CGGCGAACCC ACCTGTCCGA 



5101 ATTCGGCTAT GACT.GGGCAC AACAGACAAT CGGCTGCTCT GATGCCGCCG 
TAAGCCGATA CTGACCCGTG TTGTCTGTTA GCCGACGAGA CTACGGCGGC 



5151 TGTTCCGGCT GTCAGCGCAG GGGCGCCCGG TTCTTTTTGT CAAGACCGAC 
ACAAGGCCGA CAGTCGCGTC CCCGCGGGCC AAGAAAAACA GTTCTGGCTG 



5201 CTGTCCGGTG CCCTGAATGA ACTGCAGGAC GAGGCAGCGC GGCTATCGTG 
GACAGGCCAC GGGACTTACT TGACGTCCTG CTCCGTCGCG CCGATAGCAC 



5251 GCTGGCCACG ACGGGCGTTC CTTGCGCAGC TGTGCTCGAC GTTGTCACTG 
CGACCGGTGC TGCCCGCAAG GAACGCGTCG ACACGAGCTG CAACAGTG AC 



5301 AAGCGGGAAG GGACTGGCTG CTATTGGGCG AAGTGCCGGG GCAGGATCTC 
TTCGCCCTTC CCTGACCGAC GATAACCCGC TTCACGGCCC CGTCCTAGAG 



5351 CTGTCATCTC ACCTTGCTCC TGCCGAGAAA GTATCCATCA TGGCTGATGC 
GACAGTAGAG TGGAACGAGG ACGGCTCTTT CATAGGTAGT ACCGACTACG 



54 01 AATGCGGCGG CTGCATACGC TTGATCCGGC TACCTGCCCA TTCGACCACC 
TTACGCCGCC GACGTATGCG AACTAGGCCG ATGGACGGGT AAGCTGGTGG 



5451 AAGCGAAACA TCGCATCGAG CGAGCACGTA CTCGGATGGA AGCCGGTCTT 
TTCGCTTTGT AGCGTAGCTC GCTCGTGCAT GAGCCTACCT TCGGCCAGAA 



5501 GTCGATCAGG ATGATCTGGA CGAAGAGCAT CAGGGGCTCG CGCCAGCCGA 
CAGCTAGTCC TACTAGACCT GCTTCTCGTA GTCCCCGAGC GCGGTCCGCT 



5551 ACTGTTCGCC AGGCTCAAGG CGCGCATGCC CGACGGCGAG GATCTCGTCG 
TGACAAGCGG TCCGAGTTCC GCGCGTACGG GCTGCCGCTC CTAGAGCAGC 



5601 TGACCCATGG CGATGCCTGC TTGCCGAATA TCATGGTGGA AAATGGCCGC 
ACTGGGTACC GCTACGGACG AACGGCTTAT AGTACCACCT TTTACCGGCG 



5651 TTTTCTGGAT TCATCGACTG TGGCCGGCTG GGTGTGGCGG ACCGCTATCA 
AAAAGACCTA AGTAGCTGAC ACCGGCCGAC CCACACCGCC TGGCGATAGT 
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5701 GGACATAGCG TTGGCTACCC GTGATATTGC TGAAGAGCTT GGCGGCGAAT 
CCTGTATCGC AACCGATGGG CACTATAACG ACTTCTCGAA CCGCCGCTTA 



5751 GGGCTGACCG CTTCCTCGTG CTTTACGGTA TCGCCGCTCC CGATTCGCAG 
CCCGACTGGC GAAGGAGCAC GAAATGCCAT AGCGGCGAGG GCTAAGCGTC 



5801 CGCATCGCCT TCTATCGCCT TCTTGACGAG TTCTTCTGAG CGGGACTCTG 
GCGTAGCGGA AGATAGCGGA AGAACTGCTC AAGAAGACTC GCCCTGAGAC 



5851 GGGTTCGCAT CGATAAAATA AAAGATTTTA TTTAGTCTCC AGAAAAAGGG 
CCCAAGCGTA GCTATTTTAT TTTCTAAAAT AAATCAGAGG TCTTTTTCCC 



5901. GGGAATGAAA GACCCCACCT GTAGGTTTGG CAAGCTAGCT TAAGTAACGC 
CCCTTACTTT CTGGGGTGGA CATCCAAACC GTTCGATCGA ATTCATTGCG 



5951 CATTTTGCAA GGCATGGAAA* AATACATAAC TGAGAATAGA GAAGTTCAGA 
GTAAAACGTT CCGTACCTTT TTATGTATTG ACTCTTATCT CTTCAAGTCT 



6001 TCAAGGTCAG GAACAGATGG AACAGCTGAA TATGGGCCAA ACAGGATATC 
AGTTCCAGTC CTTGTCTACC TTGTCGACTT ATACCCGGTT TGTCCTATAG 



6051 TGTGGTAAGC AGTTCCTGCC CCGGCTCAGG GCCAAGAACA GATGGAACAG 
ACACCATTCG TCAAGGACGG GGCCGAGTCC CGGTTCTTGT CTACCTTGTC 



6101 CTGAATATGG GCCAAACAGG ATATCTGTGG TAAGCAGTTC CTGCCCCGGC 
GACTTATACC CGGTTTGTCC TATAGACACC ATTCGTCAAG GACGGGGCCG 



6151 TCAGGGCCAA GAACAGATGG TCCCCAGATG CGGTCCAGCC CTCAGCAGTT 
AGTCCCGGTT CTTGTCTACC AGGGGTCTAC GCCAGGTCGG GAGTCGTCAA 



6201 TCTAGAGAAC CATCAGATGT TTCCAGGGTG CCCCAAGGAC CTGAAATGAC 
AGATCTCTTG GTAGTCTACA AAGGTCCCAC GGGGTTCCTG GACTTTACTG 



~S251 CCTGTGCCTT ATTTGAACTA ACCAATCAGT TCGCTTCTCG CTTCTGTTCG 
GGACACGGAA TAAACTTGAT TGGTTAGTCA AGCGAAGAGC GAAGACAAGC 



6301 CGCGCTTCTG CTCCCCGAGO TCAATAAAAG AGCCCACAAC CCCTCACTCG 
GCGCGAAGAC GAGGGGCTCG AGTTATTTTC TCGGGTGTTG GGGAGTGAGC 



6351 GGGCGCCAGT CCTCCGATTG ACTGAGTCGC CCGGGTACCC GTGTATCCAA 
CCCGCGGTCA GGAGGCTAAC TGACTCAGCG GGCCCATGGG CACATAGGTT 



6401 TAAACCCTCT TGCAGTTGCA TCCGACTTGT GGTCTCGCTG TTCCTTGGGA 
ATTTGGGAGA ACGTCAACGT AGGCTGAACA CCAGAGCGAC AAGGAACCCT 



6451 GGGTCTCCTC TGAGTGATTG ACTACCCGTC AGCGGGGGTC TTTCATTCAT 
CCCAGAGGAG ACTCACTAAC TGATGGGCAG TCGCCCCCAG AAAGTAAGTA 



6501 GCAGCATGTA TCAAAATTAA TTTGGTTTTT TTTCTTAAGT ATTTACATTA 
CGTCGTACAT AGTTTTAATT AAACCAAAAA AAAGAATTCA TAAATGTAAT 



6551 AATGGCCATA GTTGCATTAA TGAATCGGCC AACGCGCGGG GAGAGGCGGT 
TTACCGGTAT CAACGTAATT ACTTAGCCGG TTGCGCGCCC CTCTCCGCCA 



6601 TTGCGTATTG GCGCTCTTCC GCTTCCTCGC TCACTGACTC GCTGCGCTCG 
AACGCATAAC CGCGAGAAGG CGAAGGAGCG AGTGACTGAG CGACGCGAGC 
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6651 GTCGTTCGGC TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG 
CAGCAAGCCG ACGCCGCTCG CCATAGTCGA GTGAGTTTCC GCCATTATGC 



6701 GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG TGAGCAAAAG 
CAATAGGTGT CTTAGTCCCC TATTGCGTCC TTTCTTGTAC ACTCGTTTTC 



6751 GCCAGCAAAA GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC 
CGGTCGTTTT CCGGTCCTTG GCATTTTTCC GGCGCAACGA CCGCAAAAAG 



6801 CATAGGCTCC GCCCCCCTGA CGAGCATCAC AAAAATCGAC GCTCAAGTCA 
GTATCCGAGG CGGGGGGACT GCTCGrAGTG TTTTTAGCTG CGAGTTCAGT 



6851 GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG 
CTCCACCGCT TTGGGCTGTC CTGATATTTC TATGGTCCGC AAAGGGGGAC 



6901 GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC 
CTTCGAGGGA GCACGCGAGA GGACAAGGCT GGGAGGGCGA ATGGCCTATG 



6951 CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG GCGCTTTCTC ATAGCTCACG 
GACAGGCGGA AAGAGGGAAG CCCTTCGCAC CGCGAAAGAG TATCGAGTGC 



7001 CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG 
GACATCCATA GAGTCAAGCC ACATCCAGCA AGCGAGGTTC GACCCGACAC 



7051 TGCACGAACC CCCCGTTCAG CCCGACCGCT GCGCCTTATC CGGTAACTAT 
ACGTGCTTGG GGGGCAAGTC GGGCTGGCGA CGCGGAATAG GCCATTGATA 



7101 CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC TGGCAGCAGC 
GCAGAACTCA GGTTGGGCCA TTCTGTGCTG AATAGCGGTG ACCGTCGTCG 



7151 CACTGGTAAC AGGATTAGCA. GAGCGAGGTA TGTAGGCGGT GCTACAGAGT 
GTGACCATTG TCCTAATCGT CTCGCTCCAT ACATCCGCCA CGATGTCTCA 



7201 TCTTGAAGTG GTGGCCTAAC TACGGCTACA CTAGAAGAAC AGTATTTGGT 
AGAACTTCAC CACCGGATTG; ATGCCGATGT GATCTTCTTG TCATAAACCA 



7251 ATCTGCGCTC TGCTGAAGCC AGTTACCTTC GGAAAAAGAG TTGGTAGCTC 
TAGACGCGAG ACGACTTCGG TCAATGGAAG CCTTTTTCTC AACCATCGAG 



7301 TTGATCCGGC AAACAAACCA CCGCTGGTAG CGGTGGTTTT TTTGTTTGCA 
AACTAGGCCG TTTGTTTGGT GGCGACCATC GCCACCAAAA AAACAAACGT 



7351 AGCAGCAGAT TACGCGCAGA AAAAAAGGAT CTCAAGAAGA TCCTTTGATC 
TCGTCGTCTA ATGCGCGTCT TTTTTTCCTA GAGTTCTTCT AGGAAACTAG 



7401 TTTTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 
AAAAGATGCC CCAGACTGCG AGTCACCTTG CTTTTGAGTG CAATTCCCTA 



7451 TTTGGTCATG AGATTATCAA AAAGGATCTT CACCTAGATC CTTTTGCGGC 
AAACCAGTAC TCTAATAGTT TTTCCTAGAA GTGGATCTAG GAAAACGCCG 



7501 CGCAAATCAA TCTAAAGTAT ATATGAGTAA ACTTGGTCTG ACAGTTACCA 
GCGTTTAGTT AGATTTCATA TATACTCATT TGAACCAGAC TGTCAATGGT 



7551 ATCCTTAATC AGTGAGGCAC CTATCTCAGC GATCTGTCTA TTTCGTTCAT 
TACGAATTAG TCACTCCGTG GATAGAGTOG CTAGACAGAT AAAGCAAGTA 
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7601 CCATAGTTGC CTGACTCCCC GTCGTGTAGA TAACTACGAT ACGGGAGGGC 
GGTATCAACG GACTGAGGGG CAGCACATCT ATTGATGCTA TGCCCTCCCG 



7651 TTACCATCTG GCCCCAGTGC TGCAATGATA CCGCGAGACC CACGCTCACC 
AATGGTAGAC CGGGGTCACG ACGTTACTAT GGCGCTCTGG GTGCGAGTGG 



7701 GGCTCCAGAT TTATCAGCAA TAAACCAGCC AGCCGGAAGG GCCGAGCGCA 
CCGAGGTCTA AATAGTCGTT ATTTGGTCGG TCGGCCTTCC CGGCTCGCGT 



7751 GAAGTGGTCC TGCAACTTTA TCCGCCTCCA TCCAGTCTAT TAATTGTTGC 
CTTCACCAGG ACGTTGAAAT AGGCGGAGGT AGGTCAGATA ATTAACAACG 



7301 CGGGAAGCTA GAGTAAGTAG TTCGCCAGTT AATAGTTTGC GCAACGTTGT 
GCCCTTCGAT CTGATTCATC AAGCGGTCAA TTATCAAACG CGTTGCAACA 



7851 TGCCATTGCT ACAGGCATCG TGGTGTCACG CTCGTCGTTT GGTATGGCTT 
ACGGTAACGA TGTCCGTAGC ACCACAGTGC GAGCAGCAAA CCATACCGAA 



7901 CATTCAGCTC CGGTTCCCAA CGATCAAGGC GAGTTACATG ATCCCCCATG 
GTAAGTCGAG GCCAAGGGTT .GCTAGTTCCG CTCAATGTAC TAGGGGGTAC 



7951 TTGTGCAAAA AAGCGGTTAG CTCCTTCGGT CCTCCGATCG TTGTCAGAAG 
. AACACGTTTT TTCGCCAATC GAGGAAGCCA GGAGGCTAGC AACAGTCTTC 



8001 TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA CTGCATAATT 
ATTCAACCGG.CGTCACAATA GTGAGTACCA ATACCGTCGT GACGTATTAA; 



8051 CTCTTACTGT CATGCCATCC GTAAGATGCT TTTCTGTGAC TGGTGAGTAC 
GAGAATGACA GTACGGTAGG CATTCTACGA AAAGACACTG ACCACTCATG 



8101 TCAACCAAGT CATTCTGAGA ATAGTGTATG CGGCGACCGA GTTGCTCTTG 
. AGTTGGTTCA GTAAGACTCT TATGACATAC GCCGCTGGCT CAACGAGAAC 



8151 CCCGGCGTCA-ATACGGGATA ATACCGCGCC- ACATAGCAGA ACTTTAAAAG 
GGGCCGCAGT TATGCCCTAT TATGGCGCGG TGTATCGTCT TGAAATTTTC 



8201 TGCTCATCAT TGGAAAACGT TCTTCGGGGC GAAAACTCTC AAGGATCTTA 
ACGAGTAGTA ACCTTTTGCA AGAAGCCCCG CTTTTGAGAG TTCCTAGAAT 



8251 CCGCTGTTGA GATCCAGTTC GATGTAACCC ACTCGTGCAC CCAACTGATC 
GGCGACAACT CTAGGTCAAG CTACATTGGG TGAGCACGTG GGTTGACTAG 



8301 TTCAGCATCT TTTACTTTCA CCAGCGTTTC TGGGTGAGCA AAAACAGGAA 
AAGTCGTAGA AAATGAAAGT GGTCGCAAAG ACCCACTCGT TTTTGTCCTT 



8351 GGCAAAATGC CGCAAAAAAG GGAATAAGGG CGACACGGAA ATGTTGAATA 
CCGTTTTACG GCGTTTTTTC CCTTATTCCC GCTGTGCCTT TACAACTTAT 



8401 CTCATACTCT TCCTTTTTCA ATATTATTGA AGCATTTATC AGGGTTATTG 
GAGTATGAGA AGGAAAAAGT. TATAATAACT TCGTAAATAG TCCCAATAAC 



8451 TCTCATGAGC GGATACATAT TTGAATGTAT TTAGAAAAAT AAACAAATAG 
AGAGTACTCG CCTATGTATA AACTTACATA AATCTTTTTA TTTGTTTATC 



8501 GGGTTCCGCG CACATTTC . 
CCCAAGGCGC GTGTAAAG 
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1 CTGCAGCCTG AATATGGGCC AAACAGGATA TCTGTGGTAA GCAGTTCCTG 
GACGTCGGAC TTATACCCGG TTTGTCCTAT AGACACCATT CGTCAAGGAC 



51 CCCCGGCTCA GGGCCAAGAA CAGATGGAAC AGCTGAATAT GGGCCAAACA 
GGGGCCGAGT CCCGGTTCTT GTCTACCTTG TCGACTTATA CCCGGTTTGT 



101 GGATATCTGT GGTAAGCAGT TCCTGCCCCG GCTCAGGGCC AAGAACAGAT 
CCTATAGACA CCATTCGTCA AGGACGGGGC CGAGTCCCGG, TTCTTGTCTA 



151 GGTCCCCAGA TGCGGTCCAG CCCTCAGCAG- TTTCTAGAGA ACCATCAGAT 
CCAGGGGTCT ACGCCAGGTC GGGAGTCGTC- AAAGATCTCT TGGTAGTCTA 



201 GTTTCCAGGG TGCCCCAAGG ACCTGAAATG ACCCTGTGCC TTATTTGAAC 
CAAAGGTCCC ACGGGGTTCC TGGACTTTAC TGGGACACGG, AATAAACTTG 



251 TAACCAATCA GTTCGCTTCT CGCTTCTGTT CGCGCGCTTC TGCTCCCCGA 
ATTGGTTAGT CAAGCGAAGA GCGAAGACAA GCGCGCGAAG ACGAGGGGCT 



301 GCTCAATAAA AGAGOCCACA ACCCCTCACT CGGGGCGCCA GTCCTCCGAT 
CGAGTTATTT TCTCGGGTGT TGGGGAGTGA GCCCCGCGGT CAGGAGGCTA 



351 TGACTGAGTC GCCCGGGTAC CCGTGTATCC AATAAACCCT CTTGCAGTTG 
ACrGACTCAG CGGGCCCATG GGCACATAGG TTATTTGGGA GAACGTCAAC 



401 CATCCGACTT GTGGTCTCGC TGTTCCTTGG GAGG<?TCTCC TCTGAGTGAT 
GTAGGCTGAA CACCAGAGCG ACAAGGAACC CTCCCAGAGG AGACTCACTA 



451 TGACTACOCG TCAGCGGGGG TCTTTCATTT GGGGGCTCGT CCGGGATCGG 
ACTGATGGGC AGTCGCCCCC AGAAAGTAAA CCCCCGAGCA GGCCCTAGCC 



501 GAGACCCCTG CCCAGGGACC ACCGACCCAC CACCGGGAGG CAAGCTGGCC 
CTCTGGGGAC GGGTCCGTGG TGGCTGGGTG GTGGCCCTCC GTTCGACCGG 



" 551 AGCAACTTAT CTGTGTCTGT CCGATTGTCT AGTGTCTATG . ACTGATTTTA 
TCGTTGAATA GACACAGACA GGCTAACAGA TCACAGATAC TGACTAAAAT 



601 TGCGCCTGCG TCGGTACTAG TTAGCTAACT AGCTCTGTAT CTGGCGGACC 
ACGCGGAGGC AGCCATGATC AATCGATTGA TCGAGACATA GACCGCCTGG 



651 CGTGGTGGAA CTGACGAGTT CTGAACACCC GGCCGCAACC CTGGGAGACG 
GCACCACCTT GACTGCTCAA GACTXGTGGG CCGGQGTTGG GACCCTCTGC 



701 TCCCAGGGAC TTTGGGGGCC GTTTTTGTGG CCCGACCTGA GGAAGGGAGT 
AGGGTCCCTG AAACCCCCGG CAAAAACACC GGGCTGGACT CCTTCCCTCA 



751 CGATGTGGAA TCCGACCCCG TCAGGATATG TGGTTCTGGT AGGAGACGAG 
GCTACACCTT AGGCTGGGGC AGTCCTATAC ACCAAGACCA TCCTCTGCTC 



801 AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGAA 
TTGGATTTTG TCAAGGGCGG AGGCAGACTT AAAAACGAAA GCCAAACCTT 



851 CCGAAGCCGC GCGTCTTGTC TGCTGCAGCA TCGTTCTGTG TTGTCTCTGT 
GGCTTCGGCG CGCAGAACAG ACGACGTCGT AGCAAGACAC AACAGAGACA 



901 CTGACTGTGT TTCTGTATTT GTCTGAAAAT TAGGGCCAGA CTGTTACCAC 
GACTGACACA AAGACATAAA CAGACTTTTA ATCCCGGTCT GACAATGGTG 
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951 TCCCTTAAGT TTGACCTTAG GTAACTGG^A AGATGTCGAG CGGCTCGCTC 
AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 



1001 ACAACCAGTC GGTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGTTGGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 



1051 GCAGAATGGC CAACCTTTAA CGTCGGATGG CCGCGAGAGG GCACCTTTAA 
CGTCTTACCG GTTGGAAATT .GCAGCCTACC GGCGCTCTGC CGTGGAAATT 



1101 CCGAGACCTC ATCACCCAGG TTAAGATCAA GGTCTTTTCA CCTGGCCCGC 
GGCTCTGGAG TAGTGGGTCC AATTCTAGTT CCAGAAAAGT GGACCGGGCG 



1151 ATGGACACCC AGACCAGGTC CCCTACATCG TGACCTGGGA AGCCTTGGCT 
TACCTGTGGG TCTGGTCCAG GGGATGTAGC ACTGGACCCT TCGGAACCGA 



1201 TTTGACCCCC CTCCCTGGGT CAAGCCCTTT GTACACCCTA AGCCTCCGCC 
AAACTGGGGG GAGGGACCCA GTTCGGGAAA CATGTGGGAT TCGGAGGCQG 



1251 TCCTCTTCCT CCATCCGCCC CGTCTCTCCC CCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTTGGA GGAGCAAGCT 



1301 CCCCGCCTCG ATCCTCCCTT TATCCAGCCC TCACTCCTTC TCTAGGCGCC 
GGGGCGGAGC TAGGAGGGAA ATAGGTCGGG AGTGAGGAAG AGATCCGCGG 



1351 GGCCGCTCTA GCCCATTAAT ACGACTCACT ATAGGGCGAT TCGAATCAGG 
CCGGCGAGAT CGGGTAATTA TGCTGAGTGA TATCCCGCTA AGCTTAGTCC 



1401 CCTTGGCGCG CCGGATCCTT AATTAAGCGC AATTGGGAGG, TGGCGGTAGC 
GGAACCGCGC GGCCTAGGAA TTAATTCGCG TTAACCCTCC ACCGCCATCG 



1451 CTCGAGATGG GCGTGATTAC JSGATTCACTG GCCGTCGTTT TACAACGTCG 
GAGCTCTACC CGCACTAATG CCTAAGTGAC CGGCAGCAAA ATGTTGCAGC 



1501 TGACTGGGAA AACCCTGGCG TTRCCCAACT TAATCGCCTT GCAGCACATC 
ACTGACCCTT TTGGGACCGC AATGGGTTGA ATTAGCGGAA CGTCGTGTAG 



1551 CCCCTTTCGC CAGCTGGCGT AATAGCGAAG AGGCCCGCAC CGATCGCCCT 
GGGGAAAGCG GTCGACCGCA TTATCGCTTC TCCGGGCGTG GCTAGCGGGA 



1601 TCCCAACAGT TACGCAGCCT GAATGGCGAA TGGCGCTTTG CCTGGTTTCC 
AGGGTTGTCA ATGCGTCGGA CTTACCGCTT ACCGCGAAAC GGACCAAAGG 



1651 GGCACCAGAA GCGGTGCCGG AAAGCTGGCT GGAGTGCGAT CTTCCTGAGG 
CCGTGGTCTT CGCCACGGCC TTTCGACCGA CCTCACGCTA GAAGGACTCC 



1701 CCGATACTGT CGTCGTCCCC TCAAACTGGC AGATGCACGG TTACGATGCG 
GGCTATGACA GCAGCAGGGG AGTTTGACCG TCTACGTGCC AATGCTACGC 



1751 CCCATCTACA CCAACGTGAC CTATCCCATT ACGGTCAATC CGCCGTTTGT 
GGGTAGATGT GGTTGCACTG GATAGGGTAA TGCCAGTTAG GCGGCAAACA 



1801 TCCCACGGAG AATCCGACGG GTTGTTACTC GCTCACATTT AATGTTGATG 
AGGGTGCCTC TTAGGCTGCC CAACAAIGAG CGAGTGTAAA TTACAACTAC 



1851 AAAGCTGGCT ACAGGAAGGC CAGACGCGAA TTATTTT TGA TGGCGTTAAC 
TTTCGACCGA TGTCCTTCCG GTCTGCGCTT AATAAAAACT ACCGCAATTG 
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1901 TCGGCGTTTC ATCTGTGGTG CAACGGGCGC TGGGTCGGTT ACGGCCAGGA 
AGCCGCAAAG TAGACACCAC GTTGCCCGCG ACCCAGCCAA TGCCGGTCCT 



1951 CAGTCGTTTG CCGTCTGAAT TTGACCTGAG CGCATTTTTA CGCGCCGGAG 
GTCAGCAAAC GGCAGACTTA AACTGGACTC GCGTAAAAAT GCGCGGCCTC 



2001 AAAACCGCCT CGCGGTGATG GTGCTGCGCT GGAGTGACGG CAGTTATCTG 
TTTTGGCGGA GCGCCACTAC CAGGACGCGA CCTCACTGCC GTCAATAGAC 



2051 GAAGATCAGG ATATGTGGGG GATGAGCGGC ATTTTCCGTG ACGTCTCGTT 
CTTCTAGTCC TA7ACACCGC CTACTCGCCG TAAAAGGCAC TGCAGAGCAA 



2101 GCTGCATAAA CCGACTACAC AAATCAGCGA TTTCCATGTT GCCACTCGCT 
CGACGTATTT GGCTGATGTG TTTAGTCGCT AAAGGTACAA CGGTGAGCGA 



2151 TTAATGATGA TTTCAGCCGC GCTGTACTGG AGGCTGAAGT TCAGATGTGC 
AATTACTACT AAAGTCGGCG CGACATGACC TCCGACTTCA AGTCTACACG 



2201 GGCGAGTTGC GTGACTACCT ACGGGTAACA GTTTCTTTAT GGCAGGGTGA 
CCGCTCAACG CACTGATGGA TGCCCATTGT CAAAGAAATA CCGTCCCACT 



2251 AACGCAGGTC GCCAGCGGCA CCGCGCCTTT CGGCGGTGAA ATTATCGATG 
TTGCGTCCAG CGGTCGCCGT GGCGCGGAAA GCCGCCACTT TAATAGCTAC 



2301 AGCGTGGTGG TTATGCCGAT CGCGTCACAC TACGTCTGAA CGTCGAAAAC 
TCGCACCACC AATACGGCTA GCGCAGTGTG ATGCAGACTT GCAGCTTTTG 



2351 CCGAAACTGT GGAGCGCCGA AATCCCGAAT CTCTATCGTG CGGTGGTTGA 
GGCTTTGACA CCTCGCGGCT TTAGGGCTTA GAGAT AGCAC GCCACCAACT 



2401 ACTGCACACC GCCGACGGCA CGCTGATTGA AGCAGAAGCC TGCGATGTCG 
TGACGTGTGG CGGCTGCCGT GCGACTAACT TCGTCTTCGG ACGCTACAGC 



2451 GTTTCCGCGA GGTGCGGATT GAAAATGGTC TGCTGCTGCT GAACGGCAAG 
CAAAGGCGCT CCACGCCTAA CTTTTACCAG ACGACGACGA CTTGCCGTTC 



2501 CCGTTGCTGA TTCGAGGCGT TAACCGTCAC GAGCATCATC CTCTGCATGG 
GGCAACGACT AAGCTCCGCA ATTGGCAGTG CTCGTAGTAG GAGACGTACC 



2551 TCAGGTCATG GATGAGCAGA CGATGGTGCA GGATATCCTG CTGATGAAGC 
AGTCCAGTAC CTACTCGTCT GCTACCACGT CCTATAGGAC GACTACTTCG 



2601 AGAACAACTT TAACGCCGTG CGCTGTTCGC ATTATCCGAA CCATCCGCTG 
TCTTGTTGAA ATTGCGGCAC GCGACAAGCG TAATAGGCTT GGTAGGCGAC 



2651 TGGTACACGC TGTGCGACCG CTACGGCCTG TATGTGGTGG ATGAAGCCAA 
ACCATGTGOG ACACGCTGGC GATGCCGGAC ATACACCACC TACTTCGGTT 



2701 TATTGAAACC CACGGCATGG TGCCAATGAA TCGTCTGACC GATGATCCGC 
ATAACTTTGG GTGCCGTACC ACGGTTACTT AGCAGACTGG CTACTAGGCG 



2751 GCTGGCTAOC GGCGATGAGC GAACGCGTAA CGCGAATGGT GCAGCGCGAT 
CGACCGATGG CCGCTACTCG CTTGCGCATT GCGCTXACCA CGTCGCGCTA 



2901 CGTAATCACC CGAGTGTGAT CATCTGGTCG CTGGGGAATG AATCAGGCCA 
GCATTAGTGG GCTCACACTA GTAGACCAGC GACCCCTTAC TTAGTCCGGT 
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2851 CGGCGCTAAT CACGACGCGC TGTATCGCTG GATCAAATCT GTCGATCCTT 
GCCGCGATTA GTGCTGCGCG ACATAGCGAC CTAGTTTAGA CAGCTAGGAA 



2901 CCCGCCCGGT GCAGTATGAA GGCGGCGGAG CCGACACCAC -GGCCACCGAT 
GGGCGGGCCA CGTCATACTT CCGCCGCCTC GGCTGTGGTG CCGGTGGCTk 



2951 ATTATTTGCC CGATGTACGC GCGCGTGGAT GAAGACCAGC CCTTCCCGGC 
TAATAAACGG GCTACATGCG CGGGCACCTA CTTCTGGTCG GGAAGGGCCG 



3001 TGTGCCGAAA TGGTCCATCA AAAAATGGCT TTCGCTACCT GGAGAGACGC 
ACACGGCTTT ACCAGGTAGT TTTTTACCGA AAGCGATGGA CCTCTCTGCG 



3051 GCCCGCTGAT CCTTTGCGAA TACGCCCACG CGATGGGTAA CAGTCTTGGC 
CGGGCGACTA GGAAACGCTT ATGCGGGTGC GCTACCCATT GTCAGAACCG 



3101 GGTTTCGCTA- AATACTGGCA GGCGTTTCGT CAGTATCCCC GTTTACAGGG 
CCAAAGCGAT TTATGACCGT CCGCAAAGCA GTCATAGGGG CAAATGTCCC 



3151 CGGCTTCGTC TGGGACTGGG TGGATCAGTC GCTGATTAAA TATGATGAAA 
GCCGAAGCAG ACCCTGACCC ACCTAGTCAG CGACTAATTT ATACTACTTT 



3201 ACGGCAACCC GTGGTCGGCT TACGGCGGTG ATTTTGGCGA TACGCCGAAC 
TGCCGTTGGG CACCAGCCGA ATGCCGCCAC TAAAACCGCT ATGCGGCTTG 



3251 GATCGCCAGT TCTGTATGAA : CGGTCTGGTC TTTGCCGACC GCACGCCGCA 
CTAGCGGTCA AGACATACTT GCCAGACCAG AAACGGCTGG CGTGCGGCGT 



3301 TCCAGCGCTG ACGGAAGCAA AACACCAGCA GCAGTTTTTC CAGTTCCGTT 
AGGTCGCGAC TGCCTTCGTT TTGTGGTCGT CGTCAAAAAG GTCAAGGCAA 



3351 TATCCGGGCA AACCATCGAA GTGACCAGCG AATACCTGXT CCGTCATAGC 
ATAGGCCCGT- TTGGTAGCTT CACTGGTCGC TTATGGACAA GGCAGTATCG 



3401 GATAACGAGC TCCTGCACTG GATGGTGGCG CTGGATGGTA AGCCGCTGGC 
CTATTGCTCG AGGACGTGAC CTACCACCGC GACCTACCAT TCGGCGACCG 



3451 AAGCGGTGAA GTGCCTCTGG ATGTCGCTCC ACAAGGTAAA CAGTTGATTG 
TTCGCCACTT CACGGAGACC TACAGCGAGG TGTTCCATTT GTCAACTAAC 



3501 AACTGCCTGA ACTACCGCAG CCGGAGAGCG CCGGGCAACT CTGGCTCACA 
TTGACGGACT TGATGGCGTC GGCCTCTCGC GGCCCGTTGA GACCGAGTGT 



3551 GTACGCGTAG TGCAACCGAA CGCGACCGCA TGGTCAGAAG CCGGGCACAT 
CATGCGCATC ACGTTGGCTT GCGCTGGCGT ACQAGTCTTC GGCCCGTGTA 



3601 CAGCGCCTGG CAGCAGTGGC GTCTGGCGGA AAACCTCAGT* GTGACGCTCC 
GTCGCGGACC GTCGTCACCG CAGACCGCCT TTTGGAGTCA CACTGCGAGG 



3651 CCGCCGCGTC CCACGCCATC CCGCATCTGA CCACCAGCGA AATGGATTTT 
GGCGGCGCAG GGTGCGGTAG GGCGTAGACT GGTGGTCGCT TTACCTAAAA 



3701 TGCATCGAGC TGGGTAATAA GCGTTGGCAA TTTAACGGCC AGTCAGGCTT 
ACGTAGCTCG ACCCATTATT CGCAACCGTT AAATTGGCGG TCAGTCCGAA 



3751 TCTTTCACAG ATGTGGATTG GCGATAAAAA ACAACTGCTG ACGCCGCTGC 
AGAAAGTGTC TACACCTAAC CGCTATTTTT TGTTGACGAC TGCGGCGACG 



41/71 



WO 01/58923 



PCT/US01/00684 



3801 


GCGATCAGTT CACCCGTGTC GATAGATCTG AACAGAAACT CATTTCCGAA 
CGCTAGTCAA GTGGGCACAG CTATCTAGAC TTGTCTTTGA GTAAAGGCTT 




3851 


GAAGACCTAG TCGACCATCA TCATCATCAT CACCGGTAAT AATAGGTAGA 
CTTCTGGATC AGCTGGTAGT AGTAGTAGTA GTGGCCATTA TTATCCATCT 




3901 


TAAGTGACTG ATTAGATGCA TTTCGAGTAG ATCCCTCGAC CAATTCCGGT 
ATTCACTGAC TAATCTAGGT AAAGGTGATC TAGGGAGCTG GTTAAGGCCA 




3951 


TATTTTCCAC CATATTGCGG TCTTTTGGCA ATGTGAGGGC CCGGAAACCT 
ATAAAAGGTG GTATAACGGC AGAAAACCGT TACACTCCCG GGCCTTTGGA 




4001 


GGCCCTGTCT TCTTGACGAG CATTCCTAGG GGTCTTTCCC CTCTCGCCAA 
CCGGGACAGA AGAACTGCTC GTAAGGATCC CCAGAAAGGG GAGAGCGGTT 




4051 


AGGAATGCAA GGTCTGTTGA ATGTCGTGAA GGAAGCAGTT CCTCTGGAAG 
TCCTTACGTT CCAGACAACT TACAGCACTT CCTTCGTCAA GGAGACCTTC 




4101 


CTTCTTGAAG ACAAACAACG TCTGTAGCGA CCCTTTGCAG GCAGCGGAAC 
GAAGAACTTC TGTTTGTTGC AGACATCGCT GGGAAACGTC CGTCGCCTTG 




4151 


CCCCCACCTG GCGACAGGTG CCTCTGCGGC CAAAAGCCAC GTGTATAAGA 
GGGGGTGGAC CGCTGTCCAC GGAGACGCCG GTTTTCGGTG CACATATTCT 




4201 


TACACCTGCA AAGGCGGCAC AACCCCAGTG CCACGTTGTG AGTTGGATAG' 
ATGTGGACGT TTCCGCCGTG TTGGGGTCAC GGTGCAACAC TCAACCTATC 




4251 


TTGTGGAAAG AG7CAAATGG CTCTCCTCAA GCGTATTCAA CAAGGGGCTG 
AACACCTTTC TCAGTTTACC GAGAGGAGTT CGCATAAGTT GTTCCCCGAC 




4301 


AAGGATGCCC AGAAGGT ACC CCATTGTATG GGATCTGATC TGGGGCCTCG 
TTCCTACGGG TCTTCCATGG GGTAACATAC CCTAGACTAG ACCCCGGAGC 




4351 


GTGCACATGC TTTACATGTG TTTAGTCGAG GTTAAAAAAC GTCTAGGCCC 
CACGTGTACG AAATGTACAC AAATCAGCTC CAATTTTTTG CAGATCCGGG 




4401 


CCCGAACCAC GGGGACGTGG TTTTCCTTTG AAAAACACGA TGATAATACC 
GGGCTTGGTG CCCCTGCACC AAAAGGAAAC TTTTTGTGCT ACTATTATGG 




4451 


ATGAAAAAGC CTGAACTCAC CGCGACGTCT GTCGAGAAGT TTCTGATCGA 
TACTTTTTCG • GACTTGAGTG GCGCTGCAGA CAGCTCTTCA AAGACTAGCT 




4501 


AAAGTTCGAC AGCGTCTCCG ACCTGATGCA GCTCTCGGAG GGCGAAGAAT 
TTTCAAGCTG TCGCAGAGGC TGGACTACGT CGAGAGCCTC CCGCTTCTTA 


• 


4551 


CTCGTGCTTT CAGCTTCGAT GTAGGAGGGC GTGGATATGT CCTGCGGGTA 
GAGCACGAAA GTCGAAGCTA CATCCTCCCG CACCTATACA GGACGCCCAT 




4601 


AATAGCTGCG CCGATGGTTT CTACAAAGAT CGTTATGTTT ATCGGCACTT 
TTATCGACGC GGCTACCAAA GATGTTTCTA GGAATACAAA TAGCCGTGAA 




4651 


TGCATCGGCC GCGCTCCCGA TTCCGGAAGT GCTTGACATT GGGGAATTTA 
ACGTAGCCGG CGCGAGGGCT AAGGCCTTCA CGAACTGTAA CCCCTTAAAT 




4701 


GCGAGAGCCT GACCTATTGC ATCTCCCGCC GTGCACAGGG TGTCACGTTG 
CGCTCTCGGA CTGGATAACG TAGAGGGCGG CACGTGTCCC ACAGTGCAAC 
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4751 CAAGACCTGC CTGAAACCGA ACTGCCCGCT GTTCTGCAGC CGGTCGCGGA 
GTTCTGGACG GACTTTdGCT TGACGGGCGA CAAGACGTCG GCCAGCGCCT 



4801 GGCCATGGAT GCGATCGCTG CGGCCGATCT TAGCCAGACG AGCGGGTTCG 
CCGGTACCTA CGCTAGCGAC GCCGGCTAGA ATCGGTCTGC TCGCCCAAGC 



4851 GCCCATTCGG ACCGCAAGGA ATCGGTCAAT ACACTACATG GCGTGATTTC 
CGGGTAAGCC TGGCGTTCCT TAGCCAGTTA TGTGATGTAC CGCACTAAAG 



4901 ATATGCGCGA TTGCTGATCC CCATGTGTAT CACTGGCAAA CTGTGATGGA 
TATACGCGCT AACGACTAGG GGTACACATA GTGACCGTTT GACACTACCT 



4951 CGACACCGTC AGTGCGTCCG TCGCGCAGGC TCTCGATGAG CTGATGCTTT 
GCTGTGGCAG TCACGCAGGC AGCGCGTCCG AGAGCTACTC GACTACGAAA 



5001 GGGCCGAGGA CTGCCCCGAA GTCCGGCACC TCGTGCACGC GGATTTCGGC 
CCCGGCTCCT GACGGGGCTT CAGGCCGTGG- AGCACGTGCG CCTAAAGCCG 



5051 TCCAACAATG TCCTGACGGA CAATGGCCGC ATAACAGCGG TCATTGACTG 
AGGTTGTTAC AGGACTGCCT GTTACCGGCG TATTGTCGCC AGTAACTGAC 



5101 GAGCGAGGCG ATGTTCGGGG ATTC£CAATA CGAGGTCGCC AACATCTTCT 
CTCGCTCCGC TACAAGCCCC TAAGGGTTAT GCTCCAGCGG TTGTAGAAGA 



5151 TCTGGAGGCC GTGGTTGGCT TGTATGGAGC AGCAGACGCG CTACTTCGAG 
AGACCTCCG«_.C^CC^CCGA_ACAT^CTCG , TCGTCTGCGC GATGAAGCTC 



5201 CGGAGGCATC CGGAGCTTGC AGGATCGCCG CGGCTCCGGG CGTATATGCT ' 
GCCTCCGTAG GCCTCGAACG TCCTAGCGGC GCCGAGGCCC GCATATACGA 



5251 CCGCATTGGT CTTGACCAAC TCTATCAGAG CTTGGTTGAC GGCAATTTCG 
GGCGTAACCA GAACTGGTTG AGATAGTCTC GAACCAACTG CCGTTAAAGC 



5301 ATGATGCAGC TTGGGCGCAG GGTCGATGCG ACGCAATCGT CCGATCCGGA 
TACTACGTCG AACCCGCGTC CCAGCTACGC TGCGTTAGCA GGCTAGGCCT 



5351 GCCGGGACTG TCGGGCGTAC ACAAATCGCC CGCAGAAGCG CGGCCGTCTG 
CGGCCCTGAC AGCCCGCATG TGTTTAGCGG GCGTCTTCGC GCCGGCAGAC 



5401 GACCGATGGC TGTGTAGAAG TACTCGCCGA TAGTGGAAAC CGACGCCCCA 
CTGGCTACCG ACACATCTTC ATGAGCGGCT ATCACCTTTG GCTGCGGGGT 



5451 GCACTCGTCC GAGGGCAAAG GAATAGAGTA GATGCCGACC GGGATCTATC 
CGTGAGCAGG CTCCCGTTTC CTTATCTCAT CTACGGCTGG CCCTAGATAG 



5501 GATAAAATAA AAGATTTTAT TTAGTCTCCA GAAAAAGGGG GGAATGAAAG 
CTATTTTATT TTCTAAAATA AATCAGAGGT CTTTTTCCCC CCTTACTTTC 



5551 ACCCCACCTG TAGGTTTGGC AAGCTAGCTT AAGTAACGCC ATTTTGCAAG 
TGGGGTGGAC ATCCAAACCG TTCGATCGAA TTCATTGCGG TAAAACGTTC 



5601 GCATGGAAAA ATACATAACT GAGAATAGAG AAGTTCAGAT CAAGGTCAGG 
CGTACCTTTT TATGTATTGA CTCTTATCTC TTCAAGTCTA GTTCCAGTCC 



5651 AACAGATGGA ACAGCTGAAT ATGGGCCAAA CAGGATATCT GTGGTAAGCA 
TTGTCTACCT TGTCGACTTA TACCCGGTTT GTCCTATAGA tACCATTCGT 
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5701 GTTCCTGCCC CGGCTCAGGG CCAAGAACAG ATGGAACACC TGAATATGGG 
CAAGGACGGG GCCGAGTCCC GGTTCTTGTC TACCTTGTCG ACTTATACCC 



5751 CCAAACAGGA TATCTGTGGT AAGCAGTTCC TGCCCCGGCT CAGGGCCAAG 
GGTTTGTCCT ATAGACACCA TTCGTCAAGG ACGGGGCCGA GTCCCGGTTC 



5801 AACAGATGGT CCCCAGATGC GGTCCAGCCC TCAGCAGTTT CTAGAGAACC 
TTGTCTACCA GGGGTCTACG CCAGGTCGGG AGTCGTCAAA GATCTCTTGG 



5851 ATCAGATGTT TCCAGGGTGC CCCAAGGACC TGAAATGACC CTGTGCCTTA 
TAGTCTACAA AGGTCCCACG GGGTTCCTGG ACTTTACTGG GACACGGAAT. 



5901 TTTGAACTAA CCAATCAGTT CGCTTCTCGC TTCTGTTCGC GCGCTTCTGC 
AAACTTGATT GGTTAGTCAA GCGAAGAGCG AAGACAAGCG CGCGAAGACG 



5951 TCCCCGAGCT CAATAAAAGA GCCCACAACC CCTCACTCGG GGCGCCAGTC 
AGGGGCTCGA GTTATTTTCT CGGGTGTTGG GGAGTGAGCC CCGCGGTCAG 



6001 CTCCGATTGA CTGAGTCGCC CGGGTACCCG TGTATCCAAT AAACCCTCTT 
GAGGCTAACT GACTCAGCGG GCCCATGGGC ACATAGGTTA TTTGGGAGAA 



6051 GCAGTTGCAT CCGACTTGTG GTCTCGCTGT TCCTTGGGAG GGTCTCCTCT 
CGTCAACGTA GGCTGAACAC CAGAGCGACA AGGAACCCTC CCAGAGGAGA 



6101 GAGTGATTGA CTACCCGTCA GCGGGGGTCT TTCATTCATG CAGCATGTAT 
CTCACTAACT GATGGGCAGT CGCCCCCAGA AAGTAAGTAC GTCGTACATA 



6151 CAAAATTAAT TTGGTTTTTT TTCTTAAGTA TTTACATTAA ATGGCCATAG 
GTTTTAATTA AACCAAAAAA - AAGAATTCAT- AAATGTAATX TACCGGTATC 



6201 TTGCATTAAT GAATCGGCCA ACGCGCGGGG AGAGGCGGTT TGCGTATTGG 
AACGTAATTA CTTAGCCGGT TGCGCGCCCC TCTCCGCCAA ACGCATAACC 



6251 CGCTCTTCCG CTTCCTCGCT CACTGACTCG CTGCGCTCGG TCGTTCGGCT 
GCGAGAAGGC GAAGGAGCGA GTGACTGAGC GACGCGAGCC AGCAAGCCGA 



6301 GCGGCGAGCG GTATCAGCTC ACTCAAAGGC GGTAATACGG TTATCCACAG. 
CGCCGCTCGC CATAGTCGAG TGAGTTTCCG CCATTATGCC AATAGGTGTC 



6351 AATCAGGGGA TAACGCAGGA AAGAACATGT GAGCAAAAGG CCAGCAAAAG 
TTAGTCCCCT ATTGCGTCCT TTCTTGTACA CTCGTTTTCC GGTCGTTTTC 



6401 GCCAGGAACC GTAAAAAGGC CGCGTTGCTG GCGTlTTTCC ATAGGCTCCG 
CGGTCCTTGG CATTTTTCCG GCGCAACGAC CGCAAAAAGG TATCCGAGGC 



6451 CCCCCCTGAC GAGCATCACA AAAATCGACG CTCAAGTCAG AGGTGGCGAA 
GGGGGGACTG CTCGTAGTGT TTTTAGCTGC GAGTTCAGTC TCCACCGCTT 



6501 ACCCGACAGG ACTATAAAGA TACCAGGCGT TTCCCCCTGG AAGCTCCCTC 
TGGGCTGTCQ TGATATTTCT ATGGTCCGCA AAGGGGGACC TTCGAGGGAG 



6551 GTGCGCTCTC CTGTTCCGAC CCTGCCGCTT ACCGGATACC TGTCCGCCTT 
CACGCGAGAG GACAAGGCTG GGACGGCGAA TGGCCTATGG ACAGGCGGAA 



6601 TCTCCCT7CG GGAAGGGTGG CGCTTTCTCA TAGCTCACGC TGTAGGTATC 
AGAGGGAAGC CCTTCGCACC GCGAAAGAGT ATCGAGTGCG ACATCCATAG 
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6651 TCAGTTCGGT GTAGGTCGTT CGCTCCAAGC TGGGCTGTGT GCACGAACCC 
AGTCAAGCCA CATCCAGCAA GCGAGGTTCG ACCCGACACA CGTGCTTGGG 



6701 CCCGTTCAGC CCGACCGCTG CGCCTTATCC GGTAACTATC GTCTTGAGTC 
GGGCAAGTCG GGCTGGCGAC GCGGAATAGG . CCATTGATAG CAGAACTCAG 



6751 CAACCCGGTA AGACACGACT TATCGGCACT GGCAGCAGCC ACTGGTAACA 
GTTGGGCCAT TCTGTGCTGA ATAGCGGTGA CCGTCGTCGG TGACCATTGT 



6B01 GGATTAGGAG AGCGAGGTAT GTAGGCGGTG CTACAGAGTT CTTGAAGTGG 
CCTAATCGTC TCGCTCCATA CATCCGCCAC GATGTCTCAA GAACTTCACC 



6851 TGGCCTAACT ACGGCTACAG TAGAAGAACA GTATTTGGTA TCTGCGCTCT 
ACCGGATTGA TGCCGATGTG ATCTTCTTGT CATAAACCAT AGACGCGAGA 



6901 GCTGAAGCCA GTTACCTTCG GAAAAAGAGT TGGTAGCTCT TGATCCGGCA 
CGACTTCGGT CAATGGAAGC CTTTTTCTCA ACCATCGAGA ACTAGGCCGT 



6951 AACAAACCAC CGCTGGTAGC GGTGGTTTTT TTGTTTGCAA GCAGCAGATT 
TTGTTTGGTG GCGACCATCQ CCACCAAAAA AAGAAACGTT CGTCGTCTAA 



7001 ACGCGCAGAA AAAAAGGATC TCAAGAAGAT CCTTTGATCT TTTCTACGGG 
TGCGCGTCTT TTTTTCCTAG AGTTCTTCTA" GGAAACTAGA AAAGATGCCC 



7051 GTCTGACGCT CAGTGGAACG AAAACTCACG TTAAGGGATT TTGGTCATGA 
CAGACTGCGA GTCACCTTGC TTTTGAGTGC AATTCCCTAA AACCAGTACT 



7101 GATTATCAAA AAGGATCTTC ACCTAGATCC TTTTAAATTA AAAATGAAGT 
CTAATAGTTT TTCCTAGAAG TGGATCTAGG AAAATTTAAT TTTTACTTCA 



7151 TTGCGGCCGC AAATCAATCT AAAGTATATA TGAGTAAACT TGGTCTGACA 
AACGCCGGCG TTTAGTTAGA TTTCATATAT ACTCATTTGA ACCAGACTGT 



7201 GTTACCAATG CTTAATCAGT GAGGCACCTA TCTCAGCGAT CTGTCTATTT 
CAATGGTTAC GAATTAGTCA, CTCCGTGGAT AGAGTCGCTA GACAGATAAA 



7251 CGTTCATCCA TAGTTGCCTG ACTCCCCGTC GTGTAGATAA CTACGATACG ■ 
GCAAGTAGG T ATCAACGGAC TGAGGGGCAG CACATCTATT GATGCTATGC 



7301 GGAGGGCTTA CCATCTGGCC CCAGTGCTGC AATGATACCG CGAGACCCAC 
CCTCCCGAAT GGTAGACGGG GGTCACGACG TTACTATGGC GCTCTGGGTG 



7351 GCTCACCGGC TCCAGATTTA TCAGCAAXAA ACCAGCCAGC CGGAAGGGCC 
CGAGTGGCCG AGGTCTAAAT AGTCGTTATT TGGTCGGTCG GCCTTCCCGG 



7401 GAGCGCAGAA GTGGTCCTGC AACTTTATCC GCCTCCATCC AGTCTATTAA 
CTCGCGTCTT CACCAGGACG TTGAAATAGG CGGAGGTAGG TCAGATAATT 



7451 TTGTTGCCGG GAAGCTAGAG TAAGTAGTTC GCCAGTTAAX AGTTTGCGCA ' 
AACAACGGCC CTTCGATCTC ATTCATCAAG CGGTCAATTA TCAAACGCGT 



7501 ACGTTGTTGC CATTGCTACA GGCATCGTGG TGTCACGCTC GTCGTTTGGT 
TGCAACAACG GTAACGATGT CCGTAGCACC ACAGTGCGAG CAGCAAACCA 



7551 ATGGCTTCAT TCAGCTCCGG TTCCCAACGA TCAAGGCGAG TTACATGATC 
TACCGAAGTA AGTCGAGGCC AAGGGTTGCT AGTTCCGCTC AATGTACTAG 
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7 601 CCCCATGTTG TGCAAAAAAG CGGTTAGCTC CTTCGGTCCT CCGATCGTTG 
GGGGTACAAC ACGTTTTTTC GCCAATCGAG GAAGCCAGGA GGCTAGCAAC 



7 651 TCAGAAGTAA GTTGGCCGCA GTGTTATCAC TCATGGTTAT GGCAGCACTG 
AGTCTTCATT CAACCGGCGT CACAATAGTG AGTACCAATA CCGTCGTGAC 



7701 CATAATTCTC TTACTGTCAT GCCATCCGTA AGA7GCTTTT CTGTGACTGG 
GTATTAAGAG AATGACAGTA CGGTAGGCAT TCTACGAAAA GACACTGACC 



7751 TGAGTACTCA ACCAAGTCAT TCTGAGAATA GTGTATGCGG CGACCGAGTT 
ACTCATGAGT TGGTTCAGTA AGACTCTTAT CACATACGCC GCTGGCTCAA 



7801 GCTCTTGCCC GGCGTCAATA CGGGATAATA CCGCGCCACA TAGCAGAACT 
CGAGAACGGG CCGCAGTTAT GCCCTATTAT GGCGCGGTGT ATCGTCTTGA 



7851 TTAAAAGTGC -TCATCATTGG AAAACGTTCT TCGGGGCGAA AACTCTCAAG 
AATTTTCACG AGTAGTAACC TTTTGCAAGA AGCCCCGCTT TTGAGAGTTC 



7901 GATCTTACCG CTGTTGAGAT CCAGTTCGAT GTAACCCACT CGTGCACCCA 
CTAGAATGGC GACAACTCTA GGTCAAGCTA CATTGGGTGA GCACGTGGGT 



7951 ACTGATCTTC AGCATCTTTT ACTTTCACCA GCGTTTCTGG GTGAGCAAAA 
TGACTAGAAG TCGTAGAAAA TGAAAGTGGT CGCAAAGACC CACTCGTTTT 



8001 ACAGGAAGGC AAAATGCCGC AAAAAAGGGA ATAAGGGCGA CACGGAAATG 
TGTCCTTCCG TTTTACGGCG TTTTTTCCCT TATTCCCGCT GTGCCTTTAC 



8051 TTGAATACTC ATACTCTTCC TTTTTCAATA TTATTGAAGC ATTTATCAGG 
AACTTATGAG TATGAGAAGG AAAAAGTTAT AATAACTTCG TAAATAGTCC 



8101 GTTATTGTCT CATGAGCGGA TACATATTTG AATGTATTTA GAAAAATAAA 
CAATAACAGA GTACTCGCCT ATGTATAAAC TTACATAAAT CTTTTTATTT 



8151 CAAATAGGGG TTCCGCGCAC ATTTC 
GTTTATCCCC AAGGCGCGTG TAAAG 
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I CTGCAGCCTG AATATGGGCC AAACAGGATA TCTGTGGTAA GCAGTTCCTG 
GACGTCGGAC TTATACCCGG TTTGTCCTAT AGACACCATT CGTCAAGGAC 



51 CCCCGGCTCA GGGCCAAGAA CAGATGGAAC AGCTGAATAT GGGCCAAACA 
GGGGCCGAGT CCCGGTTCTT GTCTACCTTG TCGACTTATA CCCGGTTTGT 



101 GGATATCTGT GGTAAGCAGT TCCTGCCCCG GCTCAGGGCC AAGAACAGAT 
CCTATAGACA CCATTCGTCA AGGACGGGGC CGAGTCCCGG TTCTTGTCTA 



151 GGTCCCCAGA TGCGGTCCAG CCCTCAGCAG TTTCTAGAGA ACCATCAGAT 
CCAGGGGTCT ACGCCAGGTC GGGAGTCGTC AAAGATCTCT TGGTAGTCTA 



201 GTTTCCAGGG TGCCCCAAGG ACCTGAAATG ACCCTGTGCC TTATTTGAAC 
CAAAGGTCCC ACGGGGTTCC TGGACTTTAC TGGGACACGG AATAAACTTG 



251 TAACCAATCA GTTCGCTTCT CGCTTCTGTT CGCGCGCTTC TGCTCCCCGA 
ATTGGTTAGT CAAGCGAAGA GCGAAGACAA GCGCGCGAAG ACGAGGGGCT 



301 GCTCAATAAA AGAGCCCACA ACCCCTCACT CGGGGCGCCA GTCCTCCGAT 
CGAGTTATTT TCTCGGGTGT TGGGGAGTGA GCCCCGCGGT CAGGAGGCTA 



351 TGACTGAGTC GCCCGGGTAC CCGTGTATCC AATAAACCCT CTTGCAGTTG 
ACTGACTCAG CGGGCCCATG GGCACATAGG TTATTTGGGA GAACGTCAAC 



4 01 CATCCGACTT GTGGTCTCGC TGTTCCTTGG GAGGGTCTCC TCTGAGTGAT 
GTAGGCTGAA CACCAGAGCG ACAAGGAACC CTCCCAGAGG AGACTCACTA 



451 TGACTACCCG TCAGCGGGGG TCTTTCATTT GGGGGC?CGT CCGGGATCGG 
ACTGATGGGC AGTCGCCCCC AGAAAGTAAA CCCCCGAGCA GGCCCTAGCC 



501 GAGACCCCTG CCCAGGGACC ACCGACCCAC CACCGGGAGG CAAGCTGGCC 
CTCTGGGGAC GGGTCCCTGG TGGCTGGGTG GTGGCCCTCC GTTCGACCGG 



551 AGCAACTTAT CTGTGTCTGT .CCGATTGTCT AGTGTCTATG ACTGATTTTA 
TCGTTGAATA GACACAGACA GGCTAACAGA TCACAGATAC TGACTAAAAT 



601 TGCGCCTGCG TCGGTACTAG TTAGCTAACT AGCTCTGTAT CTGGCGGACC 
ACGCGGACGC AGCCATGATC AATCGATTGA TCGAGACATA GACCGCCTGG 



€51 CGTGGTGGAA CTGACGAGTT CTGAACACCC GGCCGCAACC CTGGGAGACG 
GCACCACCTT GACTGCTCAA GACTTGTGGG CCGGCGrTGG GACCCTCTGC 



701 TCCCAGGGAC TTTGGGGGCC GTTTTTGTGG CCCGACCTGA GGAAGGGAGT 
AGGGTCCCTG AAACCCCCGG CAAAAACACC GGGCTGGACT CCTTCCCTCA 



751 CGATGTGGAA TCCGACCCCG TCAGGATATG TGGTTCTGGT AGGAGACGAG 
GCTACACCTT AGGCTGGGGC AGTCCTATAC ACCAAGACCA TCCTCTGCTC 



801 AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGAA 
TTGGATTTTG TCAAGGGCGG AGGCAGACTT AAAAACGAAA GCCAAACCTT 



851 CCGAAGCCGC GCGTCTTGTC TGCTGCAGCA TCGTTCTGTG TTGTCTCTGT 
GGCTTCGGCG CGCAGAACAG ACGACGTCGT AGCAAGACAC AACAGAGACA 



901 CTGACTGTGT TTCTGTATTT GTCTGAAAAT TAGGGCCAGA CTGTTACCAC 
GACTGACACA AAGACATAAA CAGACTTTTA ATCCCGGTCT GACAATGGTG 
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951 TCCCTTAAGT TTG ACCTTAG . GTAACTGGAA AGATGTCGAG CGGCTCGCTC 
AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 



1001 ACAACCAGTC GGTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGTTGGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 



1051 GCAGAATGGC CAACCTTTAA CGTCGGATGG CCGCGAGACG GCACCTTTAA 
CGTCTTACCG GTTGGAAATT GCAGCCTACC GGCGCTCTGC CGTGGAAATT 



1101 CCGAGACCTC ATCACCCAGG TTAAGATCAA GGTCTTTTCA CCTGGCCCGC 
GGCTCTGGAG TAGTGGGTCC AATTCTAGTT CCAGAAAAGT GGACCGGGCG 



1151 ATGGACACCC AGACCAGGTC CCCTACATCG TGACCTGGGA AGCCTTGGCT 
TACCTGTGGG TCTGGTCCAG GGGATGTAGC ACTGGACCCT TCGGAACCGA 



1201 TTTGACCCCC CTCCCTGGGT CAAGCCCTTT GTACACCCTA AGCCTCCGCC 
AAACTGGGGG GAGGGACCCA GT T CGGG AAA CATGTGGGAT TCGGAGGCGG 



1251 TCCTCTTCCT CCATCCGCCC CGTCTCTCCC CCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTTGGA GGAGCAAGCT 



1301 CCCCGCCTCG ATCCTCCCTT TATCCAGCCC TCACTCCTTC TCTAGGCGCC 
GGGGCGGAGC TAGGAGGGAA ATAGGTCGGG AGTGAGGAAG AGATCCGCGG 



1351 GGCCGCTCTA GCCCATTAAT ACGACTCACT ATAGGGCGAT TCGAACACCA 
CCGGCGAGAT CGGGTAATTA TGCTGAGTGA TATCCCGCTA AGCTTGTGGT 



1401 TGCACCATCA TCATCATCAC GTCGACGAAC AGAAACTCAT TTCCGAAGAA 
ACGTGGTAGT AGTAGTAGTG CAGCTGCTTG TCTTTGAGTA AAGGCTTCTT 



1451 GACCTACTCG AGATGGGCGT GATTACGGAT TCACTGGCCG TCGTTTTACA • 
CTGGATGAGC TCTACCCGCA CTAATGCCTA AGTGACCGGC AGCAAAATGT 



1501 ACGTCGTGAC TGGGAAAACC CTGGCGTTAC CCAACTTAAT CGCCTTGCAG 
TGCAGCACTG ACCCTTTTGG GACCGCAATG GGTTGAATTA GCGGAACGTC 



1551 CACATCCCCC TTTCGCCAGC TGGCGTAATA GCGAAGAGGC CCGCACCGAT 
GTGTAGGGGG AAAGCGGTCG ACCGCATTAT CGCTTCTCCG GGCGTGGCTA 



1601 CGCCCTTCCC AACAGTTACG CAGCCTGAAT GGCGAATGGC GCTTTGCCTG 
GCGGGAAGGG TTGTCAATGC GTCGGACTTA CCGCTTACCG CGAAACGGAC 



1651 GTTTCCGGCA CCAGAAGCGG TGCCGGAAAG CTGGCTGGAG TGCGATCTTC 
CAAAGGCCGT GGTCTTCGCC ACGGCCTTTC GACCGACCTC ACGCTAGAAG 



1701 CTGAGGCCGA TACTGTCGTC GTCCCCTCAA ACTGGCAGAT GCACGGTTAC 
GACTCCGGCT ATGACAGCAG CAGGGGAGTT TGACCGTCTA CGTGCCAATG 



1751 GATGCGCCCA TCTACACCAA CGTGACCTAT CCCATTACGG TCAATCCGCC 
CTACGCGGGT AGATGTGGTT GCACTGGATA GGGTAATGCC AGTTAGGCGG 



1801 GTTTGTTCCC ACGGAGAATC -CGACGGGTTG TTACTCGCTC ACATTTAATG 
CAAACAAGGG TGCCTCTTAG GCTGCCCAAC AATGAGCGAG TGTAAATTAC 



1851 TTGATGAAAG CTGGCTACAG GAAGGCCAGA CGCGAATTAT TTTTGATGGC 
AACTACTTTC GACCGATGTC CTTCCGGTCT GCGCTTAATA AAAACTACCG 
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1901 GTTAACTCGG CGTTTCATCT GTGGTGCAAC GGGCGCTGGG TCGGTTACGG 
CAATTGAGCC GCAAAGTAGA CACCACGTTG CCCGCGACCC AGCCAATGCC 



1951 CCAGGACAGT CGTTTGCCGT CTGAATTTGA CCTGAGCGCA TTTTTACGCG 
GGTCCTGTCA GCAAACGGCA GACTTAAACT GGACTCGCGT AAAAATGCGC 



2001 CCGGAGAAAA CCGCCTCGCG GTGATGGTGC TGCGCTGGAG TGACGGCAGT 
GGCCTCTTTT GGCGGAGCGC CACTACCACG ACGCGACCTC ACTGCCGTCA 



2051 TATCTGGAAG ATCAGGATAT GTGGCGGATG AGGGGCATTT TCCGTGACGT 
ATAGACCTTC TAGTCCTATA CACCGCCTAC TCGCCGTAAA AGGCACTGCA 



2101 CTCGTTGCTG CATAAACCGA CTACACAAAT CAGCGATTTC CATGTTGCCA 
GAGCAACGAC GTATTTGGCT GATGTGTTTA GTCGCTAAAG GTACAACGGT 



2151 CTCGCTTTAA TGATGATTTC AGCCGCGCTG TACTGGAGGC TGAAGTTCAG 
GAGCGAAATT ACTACTAAAG TCGGCGCGAC ATGACCTCCG ACTTCAAGTC 



2201 ATGTGCGGCG AGTTGGGTGA CTACCTACGG GTAACAGTTT CTTTATGGCA 
TACACGCCGC TCAACGCACT GATGGATGCC CATTGTCAAA GAAATACCGT 



2251 GGGTGAAACG CAGGTCGCCA GCGGCACCGC GCCTTTCGGC GGTGAAATTA 
CCCACTTTGC GTCCAGCGGT CGCCGTGGCG CGGAAAGCCG CCACTTTAAT 



2301 TCGATGAGCG TGGTGGTTAT GCCGATCGCG TCACACrACG TCTGAACGTC 
AGCTACTCGC ACCACCAATA CGGCTAGCGC AGTGTGATGC AGACTTGCAG 



2351 GAAAACCCGA AACTGTGGAG CGCCGAAATC CCGAATCTCT ATCGTGCGGT 
CTTTTGGGCT TTGACACCTC GCGGCTTTAG GGCTTAGAGA TAGCACGCCA 



2401 GGTTGAACTG CACACCGCCG ACGGCACGCT GATTGAAGCA GAAGCCTGCG 
CCAACTTGAC GTGTGGCGGC TGCCGTGCGA CTAACTTCGT CTTCGGACGC 



2451 ATGTCGGTTT CCGCGAGGTG CGGATTGAAA ATGGTCTGCT GCTGCTGAAC 
TACAGCCAAA GGCGCTCCAC GCCTAACTTT TACCAGACGA CGACGACTTG 



2501 GGCAAGCCGT TGCTGATTCG AGGCGTTAAC CGTCACGAGC ATCATCCTCT 
CCGTTCGGCA ACGACTAAGC TCCGCAATTG GCAGTGCTCG TAGTAGGAGA- 



2551 GCATGGTCAG GTCATGGATG" AGCAGACGAT GGTGCAGGAT ATCCTGCTGA 
CGTACCAGTC CAGTACCTAC TCGTCTGCTA CCACGTCCTA TAGGACGACT 



2601 TGAAGCAGAA CAACTTTAAC GCCGTGCGCT GTTCGCArTA TCCGAACCAT 
ACTTCGTCTT GTTG AAATTG CGGCACGCGA CAAGCGTAAT AGGCTTGGTA 



2651 CCGCTGTGGT ACACGCTGTG CGACCGCTAC GGCCTGTATG TGGTGGATGA 
GGCGACACCA TGTGCGACAC GCTGGCGATG CCGGACATAC ACCACCTACT 



2701 AGCCAATATT GAAACCCACG GCATGGTGCC AATGAATCGT CTGACCGATG 
TCGGTTATAA CTTTGGGTGC CGTACCACGG TTACTTAGCA GACTGGCTAC 



2751 ATCCGCGCTG GCTACCGGCG ATGAGCGAAC GCGTAACGCG AATGGTGCAG 
TAGGCGCGAC CGATGGCCGC TACTCGCTTG CGCATTGCGC TTACCACGTC 



2801 CGCGATCGTA ATCACCCGAG TGTGATCATC TGGTCGCTGG GGAATGAATC 
GCGCTAGCAT TAGTGGGCTC ACACTAGTAG ACCAGCGACC CCTTACTTAG 
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2851 


AGGCCACGGC GCTAATCACG ACGCGCTGTA TCGCTGGATC AAATCTGTCG 

TCCGGTGCCG CGATTAGTGC TGCGCGACAT AGCGACCTAG TTTAGACAGC 

i 




2901 


ATCCTTCCCG CCCGGTGCAG TATGAAGGCG GCGGAGCCGA CACCACGGCC 
TAGGAAGGGC GGGCCACGTC ATACTTCCGC CGCCTCGGCT GTGGTGCCGG 




2951 


ACCGATATTA TTTGCCCGAT GTACGCGCGC GTGGATGAAG ACCAGCCCTT 
TGGCTATAAT AAACGGGCTA CATGCGCGCG CACCTACTTC TGGTCGGGAA 




3001 


CCCGGCTGTG CCGAAATGGT CCATCAAAAA ATGGCTTTCG CTACCTGGAG 
GGGCCGACAC GGCTTTACCA GGTAGTTTTT TACCGAAAGC GATGGACCTC 




3051 


AGACGCGCCC GCTGATCCTT TGCGAATACG CCCACGCGAT GGGTAACAGT 
TCTGCGCGGG CGACTAGGAA ACGCTTATGC GGGTGCGCTA CCCATTGTCA 


» 


3101 


CTTGGCGGTT TCGCTAAATA CTGGCAGGCG TTTCGTCAGT ATCCCCGTTT 
GAACCGCCAA AGCGATTTAT GACCGTCCGC AAAGCAGTCA TAGGGGCAAA 




3151 


ACAGGGCGGC TTCGTCTGGG ACTGGGTGGA TCAGTCGCTG ATTAAATATG 
TGTCCCGCCG AAGCAGACCC TGACCCACCT AGTCAGCGAC TAATTTATAC 




3201 


ATGAAAACGG CAACCCGTGG TCGGCTTACG GCGGTGATTT TGGCGATACG 
TACTTTTGCC GTTGGGCAGC AGCCGAATGC CGCCACTAAA ACCGCTATGC 




3251 


CCGAACGATC GCCAGTTCTG TATGAACGGT CTGGTCTTTG. CCGACCGCAC 
GGCTTGCTAG CGGTCAAGAC ATACTTGCCA GACCAGAAAC GGCTGGCGTG 




3301 


GCCGCATCCA GCGCTGACGG AAGCAAAACA CCAGCAGCAG TTTTTCCAGT 
CGGCGTAGGT CGCGACTGCC TTCGTTTTGT GGTCGTCGTC AAAAAGGTCA 




3351 


TCCGTTTATC CGGGCAAACC ATCGAAGTGA CCAGCGAATA CCTGTTCCGT 
AGGCAAATAG GCCCGTTTGG TAGCTTCACT GGTCGCTTAT GGACAAGGCA 




3401 


CATAGCGATA ACGAGCTCCT GCACTGGATG GTGGCGCTGG ATGGTAAGCC 
GTATCGCTAT TGCTCGAGGA CGTGACCTAC CACCGCGACC TACCATTCGG 




3451 


GCTGGCAAGC GGTGAAGTGC CTCTGGATGT CGCTCCACAA GGTAAACAGT 
CGACCGTTCG CCACTTCACG GAGACCTACA GCGAGGTGTT CCATTTGTCA 




3501 


TGATTGAACT GCCTGAACTA CCGCAGCCGG AGAGCGCCGG GCAACTCTGG 
ACTAACTTGA CGGACTTGAT GGCGTCGGCC TCTCGCGGCC CGTTGAGACC 




3551 


CTCACAGTAC GCGTAGTGCA ACCGAACGCG ACCGCATGGT CAGAAGCCGG 
GAGTGTCATG CGCATCACGT TGGCTTGCGC TGGCGTACCA GTCTTCGGCC 


* • 


3601 


GCACATCAGC GCCTGGGAGC AGTGGCGTCT GGCGGAAAAC CTCAGTGTGA 
CGTGTAGTCG CGGACCGTCG TCACCGCAGA CCGCCTTTTG GAGTCACACT 




3651 


CGCTCCCCGC CGCGTCCCAC GCCATCCCGC ATCTGACCAC CAGCGAAATG 
GCGAGGGGCG GCGCAGGGTG CGGTAGGGCG TAGACTGGTG GTCGCTTTAC 




3701 


GATTTTTGCA TCGAGCTGGG TAATAAGCGT TGGCAATTTA ACCGCCAGTC 
CTAAAAACGT AGCTCGACCC ATTATTCGCA ACCGTTAAAT TGGCGGTCAG 




3751 


AGGCTTTCrT TCACAGATGT GGATTGGCGA TAAAAAACAA. CTGCTGACGC 
TCCGAAAGAA AGTGTCTACA CCTAACCGCT ATTTTTTGTT GACGACTGCG 
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3801 CGCTGCGCGA TCAGTTCACC CGTGTCGATA GATCTGGAGG TGGTGGCAGC 
GCGACGCGCT AGTCAAGTGG GCACAGCTAT CTAGACCTCC ACCACCGTCG 



3851 AGGCCTTGGC GCGCCGGATC CTTAATTAAC AATTGACCGG TAATAATAGG 
TCCGGAACCG CGCGGCCTAG GAATTAATTG TTAACTGGCC ATTATTATCC 



3901 TAGATAAGTG ACTGATTAGA TGCATTTCGA CTAGATCCCT CGACCAATTC 
ATCTATTCAC TGACTAATCT ACGTAAAGCT GATCTAGGGA GCTGGTTAAG 



3951 CGGTTATTTT CCACCATATT GCCG7CTTTT GGCAATGTGA GGGCCCGGAA 
GCCAATAAAA GGTGGTATAA CGGCAGAAAA CCGTTACACT CCCGGGCCTT 



4001 ACCTGGCCCr GTCTTCTTGA CGAGCATTCC TAGGGGTCTT TCCCCTCTCG 
TGGACCGGGA CAGAAGAACT GCTCGTAAGG ATCCCCAGAA AGGGGAGAGC 



4 051 CCAAAGGAAT GCAAGGTCTG TTGAATGTCG TGAAGGAAGC AGTTCCTCTG 
GGTTTCCTTA CGTTCCAGAC AACITACAGC ACTTCCTTCG: TCAAGGAGAC 



4101 GAAGCTTCTT GAAGACAAAC AACGTCTGTA GCGACCCTTT GCAGGCAGCG 
CTTCGAAGAA CTTCTGTTTG TTGCAGACAT CGCTGGGAAA CGTCCGTCGC 



4151 GAACCCCCCA CCTGGCGACA GGTGCGTCTG CGGCCAAAAG CCACGTGTAT 
CTTGGGGGGT GGACCGCTGT CCACGGAGAC GCCGGTTTTC GGTGCACATA 



4201 AAGATACACC TGCAAAGGCG GCACAACCCC AGTGCCACGr TGTGAGTTGG 
TTCTATGTGG ACGTTTCCGC_ CGTGTTGGGG TCACGGTGCA ACACTCAACC 



4 251 ATAGTTGTGG AAAGAGTCAA ATGGCTCTCC TCAAGCGTAT TCAACAAGGG 
. TATCAACACC TTTCTCAGTT TACCGAGAGG AGTTCGCATA AGTTGTTCCC 



4301 GCTGAAGGAT GCCCAGAAGG TACCCCATTG TATGGGATCT GATCTGGGGC 
CGACTTCCTA CGGGTCTTCC ATGGGGTAAC ATACCCTAGA CTAGACCCCG 



4351 CTCGGTGCAC ATGCTTTACA TGTGTTTAGT CGAGGTTAAA AAACGTCTAG 
GAGCCACGTG TACGAAATGT ACACAAATCA GCTCCAATTT TTTGCAGATC 



4401 GCCCCCCGAA CCACGGGGAC GTGGrTTTCC TTTGAAAAAC ACGATGATAA 
CGGGGGGCTT GGTGCCCCTG CACCAAAAGG AAACTTTTTG TGCTACTATT 



4451 TACCATGAAA AAGCCTGAAC TCACCGCGAC GTCTGTCGAG AAGTTTCTGA. 
ATGGTACTTT TTCGGACTTG AGTGGCGCTG CAGACAGCTC TTCAAAGACT 



4501 TCGAAAAGTT CGACAGCGTC TCCGACCTGA TGCAGCJTCTC GGAGGGCGAA 
AGCTTTTCAA GCTGTCGCAG AGGCTGGACT ACGTCGAGAG CCTCCCGCTT 



4551 GAATCTCGTG CTTTCAGCTT CGATGTAGGA GGGCGTGGAT ATGTCCTGCG 
CTTAGAGGAC GAAAGTCGAA GCTACATCCT CCCGCACCTA TACAGGACGC 



4 601 GGTAAATAGC TGCGCCGATG GTTTCTACAA AGATCGTTAT GTTTATCGGC 
CCATTTATCG ACGCGGCTAC CAAAGATGTT TCTAGCAATA CAAATAGCCG 



4651 ACTTTGCATC GGCCGCGCTC CCGATTCCGG AAGTGCTTGA CATTGGGGAA 
TGAAACGTAG CCGGCGCGAG GGCTAAGGCC TTCACGAACT GTAACCCCTT 



4701 TTTAGCGAGA GCCTGACCTA TTGCATCTCC CGCCGTGCAC AGGGTGTCAC 
AAATCGCTCT CGGACTGGAT AACGTAGAGG GCGGCACGTG TCCCACAGTG 
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4 751 GTTGCAAGAC CTGCCTGAAA CCGAACTGCC CGCTGTTCTG CAGCCGGTCG 
CAACGTTCTG GACGGACTTT GGCTTGACGG GCGACAAGAC GTCGGCCAGC 



4801 CGGAGGCCAT GGATGCGATC GCTGCGGCCG ATCTTAGCCA GACGAGCGGG 
GCCTCCGGTA CCTACGCTAG CGACGCCGGC TAGAATCGGT CTGCTCGCCC 



4851 TTCGGCCCAT TCGGACCGCA AGGAATCGGT CAATACACTA CATGGCGTGA 
AAGCCGGGTA AGCCTGGCGT TCCTTAGCCA GTTATGTGAT GTACCGCACT 



4901 TTTCATATGC GCGATTGCTG ATCCCCATGT GTATCACTGG CAAACTGTGA 
AAAGTATACG CGCTAACGAC TAGGGGTACA CATAGTGACC GTTTGACACT 



4951 TGGACGACAC CGTCAGTGCG TCCGTCGCGC AGGCTCTCGA TGAGCTGATG 
ACCTGCTGTG GCAGTCACGC AGGCAGCGCG TCCGAGAGCT ACTCGACTAC 



5001 CTTTGGGCCG AGGACTGCCC CGAAGTCCGG CACCTCGTGC ACGCGGATTT 
GAAACCCGGC TCCTGACGGG GCTTCAGGCC GTGGAGCACG TGCGCCTAAA 



5051 CGGCTCCAAC AATGTCCTGA CGGACAATGG CCGCATAACA GCGGTCATTG 
GCCGAGGXTG TTACAGGACT GCCTGTTACC GGCGTATTGT CGCCAGTAAC 



5101 ACTGGAGCGA GGCGATGTTC GGGGATTCCC AATACGAGGT CGCCAACATC 
TGACCTCGCT CCGCTACAAG CCCCTAAGGG TTATGCTCCA GCGGTTGTAG 



5151 TTCTTCTGGA GGCCGTGGTT GGCTTGTATG GAGCAGCAGA CGCGCTACTT 
AAGAAGACCT CCGGCACCAA CCGAACATAC CTCGTCGTCT GCGCGATGAA 



5201 CGAGCGGAGG CATCCGGAGC TTGCAGGATC GCCGCGGCTC CGGGCGTATA 
GCTCGCCTCC GTAGGCCTCG AACGTCCTAG CGGCGCCGAG GCCCGCATAT 



5251 TGCTCCGCAT TGGTCTTGAC CAACTCTATC AGAGCTTGGT TGACGGCAAT 
ACGAGGCGTA ACCAGAACTG GTTGAGATAG TCTCGAACCA ACTGCCGTTA 



5301 TTCGATGATG CAGCTTGGGC GCAGGGTCGA TGCGACGCAA TCGTCCGATC 
AAGCTACTAC GTCGAACCCG CGTCCCAGCT ACGCTGCGTT AGCAGGCTAG 



5351 CGGAGCCGGG ACTGTCGGGC GTACACAAAT CGCCCGCAGA AGCGCGGCCG 
GCCTCGGCCC TGACAGCCCG CATGTGTTTA GCGGGCGTCT TCGCGCCGGC 



5401 TCTGGACCGA TGGCTGTGTA GAAGTACTCG. CCGATAGTGG AAACCGACGC 
AGACCTGGCT ACCGACACAT CTTCATGAGC GGCTATCACC TTTGGCTGCG 



5451 CCCAGCACTC GTCCGAGGGC AAAGGAATAG AGTAGATGCC GACCGGGATC 
(SGGTCGTGAG CAGGCTCCCG TTTCCPTATC TCATCTACGG CTGGCCCTAG 



5501 TATCGATAAA ATAAAAGATT TTATTTAGTC TCCAGAAAAA GGGGGGAATG 
ATAGCTATTT TATTTtCTAA AATAAATCAG AGGTCTTTTT CCCCCCTTAC 



5551 AAAGACCCCA CCTGTAGGTT TGGCAAGCTA GCTTAAGTAA CGCCATTTTG 
TTTCTGGGGT GGACATCCAA ACCGTTCGAT CGAATTCATT GCGGTAAAAC 



5601 CAAGGCATGG AAAAATACAT AACTGAGAAT AGAGAAGTTC AGATCAAGGT 
GTTCCGTACC TTTTTATGTA TTGACTCTTA TCTCTTCAAG TCTAGTTCCA 



5651 CAGGAACAGA TGGAACAGCT GAATATGGGC CAAACAGGAT ATCTGTGGTA 
GTCCTTGTCT ACCTTGTCGA CTTATACCCG GTTTGTCCTA TAGACACCAT 
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5*701 


AGCAGTTCCT 
TCGTCAAGGA 


GCCCCGGCTC AGGGCCAAGA ACAGATGGAA CAGCTGAATA 
CGGGGCCGAG TCCCGGTTCT TGTCTACCTT GTCGACTTAT 


5151 


TGGGCCAAAC 
ACCCGGTTTG 


AGGATATCTG TGGTAAGCAG TTCCTGCCCC GGCTCAGGGC 
TCCTATAGAC ACCATTCGTC AAGGACGGGG CCGAGTCCCG 


5801 


CAAGAACAGA 
GTTCTTGTCT 


TGGTCCCCAG ATGCGGTCCA GCCCTCAGCA GTTTCTAGAG 
ACCAGGGGTC TACGCCAGGT CGGGAGTCGT CAAAGATCTC 


5851 


AACCATCAGA 
TTGGTAGTCT 


TGTTTCCAGG GTGCCCCAAG GACCTGAAAT GACCCTGTGC 
ACAAAGGTCC CACGGGGTTC CTGGACTTTA CTGGGACACG 


5901 


CTTATTTGAA 
GAATAAACTT 


CTAACCAATC AGTTCGCTTC TCGCTTCTGT TCGCGCGCTT " 
GATTGGTTAG TCAAGCGAAG AGCGAAGACA AGCGCGCGAA 


5951 


CTGCTCCCCG AGCTCAATAA AAGAGCCCAC AACCCCTCAC TCGGGGCGCC 
GACGAGGGGC TCGAGTTATT TTCTCGGGTG TTGGGGAGTG AGCCCCGCGG 


6001 


AGTCCTCCGA TTGACTGAGT CGCCCGGGTA CCCGTGTATC CAATAAACCC 
TCAGGAGGCT AACTGACTCA GCGGGCCCAT GGGCACATAG GTTATTTGGG 


6051 


TCTTGCAGTT 
AGAACGTCAA 


GCATCCGACT TGTGGTCTCG CTGTTCCTTG GGAGGGTCTC 
CGTAGGCTGA ACACCAGAGC GACAAGGAAC CCTCCCAGAG 


O ± \J X 


CTCTGAGTGA 
GAGACTCACT 


TTGACTACCC GTCAGCGGGG GTCTTTCATT CATGCAGCAT 
AACTGATGGG CAGTCGCCCC CAGAAAGTAA GTACGTCGTA 


6151 


GTATCAAAAT TAATTTGGTT TTTTTTCTTA AGTATTTACA TTAAATGGCC 
CATAGTTTTA ATTAAACCAA AAAAAAGAAT TCATAAATGT AATTTACCGG 


6201 


ATAGTTGCAT TAATGAATCG GCCAACGCGC GGGGAGAGGC GGTTTGCGTA 
TATCAACGTA ATTACTTAGC CGGTTGCGCG CCCCTCTCCG CCAAACGCAT 


6251 


TTGGCGCTCT TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC 
AACCGCGAGA AGGCGAAGGA GCGAGTGACT GAGCGACGCG AGCCAGCAAG 


6301 




AGCGGTATCA GCTCACTCAA AGGCGGTAAT ACGGTTATCC 




CCGACGCCGC 


TCGCCATAGT CGAGTGAGTT TCCGCCATTA TGCCAATAGG 


6351. 


ACAGAATCAG 
TGTCTTAGTC 


GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA 
CCCTATTGCG TCCTTTCTTG TACACTCGTT TTCCGGTCGT 


6401 


AAAGGCCAGG 
TTTCCGGTCC 


AACCGTAAAA AGGCCGCGTT GCTGGCGTTT TTCCATAGGC 
TTGGCATTTT TCCGGCGCAA CGACCGCAAA AAGGTATCCG 


6451 


TCCGCCCCCC 
AGGCGGGGGG 


TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG 
ACTGCTCGTA GTGTTTTTAG CTGCGAGTTC AGTCTCCACC 


6501 


CGAAACCCGA 
GCTTTGGGCT 


CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC 
GTCCTGATAT TTCTATGGTC CGCAAAGGGG GACCTTC6AG 


6551 


CCTCGTGCGC TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG 
GGAGCACGCG AGAGGACAAG GCTGGGACGG CGAATGGCCT ATGGACAGGC 


6601 


CCTTTCTCCC 
GGAAAGAGGG 


TTCGGGAAGC GTGGCGCTTT CTCATAGCTC ACGCTGTAGG 
AAGCCCTTCG CACCGCG AAA " • GAGTATCGAG TGCGACATCC 
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6651 TATCTCAGTT CGGTGTAGGT CGTTCGCTCC AAGCTGGGCT GTGTGCACGA 
ATAGAGTCAA GCCACATCCA GCAAGCGAGG TTCGACCCGA CACACGTGCT 



6701 ACCCCCCGTT CAGCCCGACC GCTGeGCCTT ATCCGGTAAC TATCGTCTTG 
TGGGGGGCAA GTCGGGCTGG CGACGCGGAA TAGGCCATTG ATAGCAGAAC 



6*751 AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 
TCAGGTTGGG CCATTCTGTG CTGAATAGCG GTGACCGTCG TCGGTGACCA 



6 8 OX AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA 
TTGTCCTAAT CGTCTCGCTC CATACATCCG CCACGATGTC TCAAGAACTT 



6851 GTGGTGGCCT AACTACGGCT ACACTAGAAG AACAGTATTT GGTATCTGCG 
CACCACCGGA TTGATGCCGA TGTGATCTTC TTGTCATAAA CCATAGACGC 



6901 CTCTGCTGAA GCCAGTTACC TTCGGAAAAA GAGTTGGTAG CTCTTGATCC 
GAGACGAGTT CGGTCAATGG AAGCCTTTTT CTCAACCATC GAGAACTAGG 



6951 GGCAAACAAA CCACCGCTGG TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA 
CCGTTTGTTT GGTGGCGACC ATCGCCACCA AAAAAACAAA CGTTCGTCGT 



7001 GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG ATCTTTTCTA 
CTAATGCGCG XCTTTTTTTC CTAGAGTTCT: TCTAGGAAAC TAGAAAAGAT 



7051 CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC - 
GCCCCAGACT GCGAGTCACC TTGCTTTTGA GTGCAATTCC CTAAAACCAG 



7101 t ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTGC GGCCGCAAAT 
TACTCTAATA GTTTTTCCTA GAAGTGGATC TAGGAAAACG CCGGCGTTTA 



7151 CAATCTAAAG TATATATGAG TAAACTTGGT CTGACAGTTA CCAATGCTTA 
GTTAGATTTC ATATATACTC ATTTGAACCA GACTGTCAAT GGTTACGAAT 



7201 ATCAGTGAGG CACCTATCTC AGCGATCTGT CTATTTCGTT CATCCATAGT 
TAG TCACTCC GTGGATAGAG TCGCTAGACA GATAAAGCAA GTAGGTATCA 



7251 TGCCTGACTC CCCGTCGTGT AGATAACTAC GATACGGGAG GGCTTACCAT 
ACGGACTGAG GGGCAGCACA TCTATTGATG CTATGCCCTC CCGAATGGTA 



7301 CTGGCCCCAG TGCTGCAATG ATACCGCGAG ACCCACGCTC ACCGGCTCCA 
GACCGGGGTC ACGACGTTAC TATGGCGCTC TGGGTGCGAG TGGCCGAGGT 

7351 GATTTATCAG CAATAAACCA GCCAGCCGGA AGGGCCGAGC GCAGAAGTGG 
CTAAATAGTC GTTATTTGGT CGGTCGGCCT TCCCGGCTCG CGTCTTCACC 



7401 TCCTGCAACT TTATCCGCCT CCATCCAGTC TATTAATTGT TGCCGGGAAG 
AGGACGTTGA AATAGGCGGA GGTAGGTCAG ATAATTAACA ACGGCCCTTC 



7451 CTAGAGTAAG TAGTTCGCCA GTTAATAGTT TGCGCAACGT rGTTGCCATT 
GATCTCATTC ATCAAGCGGT CAATTATCAA ACGCGTTGCA ACAACGGTAA 



7501 GCTACAGGCA TCGTGGTGTC ACGCTCGTCG TTTGGTATGG CTTCATTCAG 
CGATGTCCGT AGCACCACAG TGCGAGCAGC AAACCATACC GAAGTAAGTC 



7551 CTCCGGTTCC CAACGATCAA GGCGAGTTAC ATGATCCXCC ATGTTGTGCA 
GAGGCCAAGG GTTGCTAGTT CCGCTCAATG TACTAGGGGG TACAACACGT 
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7601 AAAAAGCGGT TAGCTCCTTC GGTCCTCCGA TCGTTGTCAG AAGTAAGTTG 
TTTTTCGCCA AT.CGAGGAAG CCAGGAGGCT AGCAACAGTC TTCATTCAAC 



7651 GCCGCAGTGT TATCACTCAT GGTTATGGCA GCACTGCATA ATTCTCTTAC 
CGGCGTCACA ATAGTGAGTA CCAATACCGT CGTGACGTAT TAAGAGAATG 



7701 TGTCATGCCA TCCGTAAGAT GCTTTTCTGT GACTGGTGAG TACTCAACCA 
ACAGTACGGT AGGCATTCTA CGAAAAGACA CTGACCACTC ATGAGTTGGT 



7751 AGTCATTCTG AGAATAGTGT ATGCGGCGAC CGAGTTGCTC TTGCCCGGCG 
TCAGTAAGAC TCTTATCACA TACGCCGCTG GCTCAACGAG AACGGGCCGC 



7801 TCAATACGGG ATAATACCGC GCCACATAGC AGAACTTTAA AAGTGCTCAT 
AGTTATGCCC TATTATGGCG CGGTGTATCG TCTTGAAATT TTCACGAGTA 



7851 CATTGGAAAA CGTTCTTCGG GGCGAAAACT CTCAAGGATC TTACCGCTGT 
GTAACCTTTT GCAAGAAGCC CCGCTTTTGA GAGTTCCTAG AATGGCGACA 



7901 TGAGATCCAG TTCGATGTAA CCCACTCGTG CACCCAACTG ATCTTCAGCA 
ACTCTAGGTC AAGCTACATT GGGTGAGCAC GTGGGTTGAC TAGAAGTCGT 



7951 TCTTTTACTT TCACCAGCGT TTCTGGGTGA GCAAAAACAG GAAGGCAAAA 
AGAAAATGAA AGTGGTCGCA AAGACCCACT CGTTTTTGTC CTTCCGTTTT 



8001 TGCCGCAAAA AAGGGAATAA GGGCGACACG GAAATGTTGA ATACTCATAC 
ACGGCGTTTT TTCCCTTATT CCCGCTGTGC CTTTACAACT TATGAGTATG 



8051 TCTTCCTTTT TCAATATTAT TGAAGCATTT ATCAGGGTTA TTGTCTCATG 
AGAAGGAAAA AGTTATAATA ACTTCGTAAA TAGTCCCAAT AACAGAGTAC 



8101 AGCGGATACA TATTTGAATG TATTTAGAAA AATAAACAAA TAGGGGTTCC 
TCGCCTATGT ATAAACTTAC ATAAATCTTT TTATTTGTTT ATCCCCAAGG 



8151 GCGCACATTT C 
CGCGTGTAAA G 
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5'MoMuLVLTR 




Figure 14 
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Figure 15 
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Figure 16 
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Figure 17 
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5* MoMuLVLTR 



3* MoMuLVLTR 




Stop delta OMEGAP-gal 



Figure 19 
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Figure 20 
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Figure 21 
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Figure 22 
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psi extended packaging signal 
"GPCR 

SerSerSer 



Vector for. Expression of a GPCR with inserted 
Seronine/Threonine amino acid sequences as a fusion with p-gal Aa. 



FIGURE 24 
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. 5'MoMuLVLTR 




Vector for Expression of mutant (R170E) p-arrestin2 as a fusion 
with P-gal Acq. 



FIGURE 25 
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